Incident Management – an explanation and example
News and information from the Advent IM team.
Advent IM Security Consultant, Del Brazil, offers some guidance on best practice in Incident Management.
Incident Management is defined by the Information Technology Infrastructure Library (ITIL) is ‘To restore normal service operation as quickly as possible and minimise the impact on business operations, thus ensuring that agreed levels of service are maintained.’ Although this definition is very much aligned to the service delivery element of IT, organisations should translate it to all areas of the organisation to form the basis of any incident management strategy.
Any Incident Management process should include:-
Incident detection and recording – Ensuring that sufficient and appropriate means of both detecting and reporting of incidents is critical, as failure to report incidents can have a serious impact upon an organisation. There maybe a legal requirement for incidents to be reported such as incidents associated with the loss of personal data or security breaches related to protectively marked information, although not applicable to every organisation. Ensuring that an incident is correctly reported will facilitate the correct actions are taken in line with the incident management plan and thus ensure the correct allocation of resources.
An example maybe that an individual receives an email from an untrusted source and without realising any inherent risk, opens an attachment, which in turn causes their terminal to become unresponsive. The individual contacts the IT department in the first instance in order to initiate some form of containment measures, whilst also documenting down how the incident occurred.
Classification and initial support – There are various levels of severity associated with different types of incident and ensuring that they are correctly classified will mean that the appropriate resources or emergency services are tasked accordingly. These levels of severity range from low impact/minor incident requiring a limited number and type of resources, through to a major incident, which has the potential to impact on the whole organisation and requires a substantial amount of resources to manage or recover from. In the early stages of any incident the support provided by a designated incident response team is vital as their initial actions can have potentially massive implications on the organisations ability to resume normal operations.
Following on from the previous example the incident may be classified as a low priority at this stage as only one terminal/user has been affected. The IT department may have tasked a limited number of resources in tracking down the suspicious email on the mail server and then taken the appropriate quarantining and/or deleting procedures.
Investigation and diagnosis – Further and ongoing investigations into the incident may identify trends or patterns that could further impact on the organisation, once normal operations have been resumed.
Keeping in mind the example previously discussed, should the initial findings of the IT department reveal that the email has been received by a large number of users, then further impact analysis should be undertaken to establish the impact or effect on services before any additional resources are dedicated to resolving the issue. This further investigation requires an organisation-wide broadcast, highlighting the incident and what actions should be taken in the event that users received suspicious emails or attachments.
Resolution and recovery – Ensuring that the correct rectification method is deployed is paramount, as no two incidents are the same and as such any incident management plan should have a degree of flexibility to accommodate potential variations.
Using our example scenario, the correct rectification solution in this instance would be to purge the mail server of any copies of the suspicious email and then to execute the scanning of the mail server with an anti-virus and/or anti-spam product. Consideration should be given as to whether to take the mail server off line to perform the relevant scans, however any potential down time may impact on the output of the organisation. In the event that the mail server is taken off line, it is imperative that communication is maintained with all staff, contractors, customers and third party suppliers etc.
Incident closure – The closure of an incident should be clearly communicated to all parties involved in managing or effecting rectification processes as should a statement stating ‘Business has resumed to normal’ to clearly indicate to all concerned that normal operations can continue.
In our example , it’s essential that all persons involved or impacted by the incident are informed accordingly which formally closes the incident. This also reassures any interested parties that normal service has been resumed thus preventing any additional business continuity plan being invoked.
Incident ownership, monitoring, tracking and communication – An Incident Manager/Controller should take clear ownership of any incident so that all relevant information is communicated in an effective way to facilitate informed decisions to be made along with the correct allocation of resources.
As always, good communication is vital not only with staff, emergency services and the press but also with key suppliers and customers, as these may have to invoke their own business continuity plans as a result of the incident. Business continuity plans ensure critical outputs are maintained but the invoking of a plan comes at a cost, whether it be financial or an impact to operational outputs. It is therefore imperative that once an incident has been deemed formally closed then key suppliers and customers should be informed accordingly, this will enable them to also return to normal operations. Post incident analysis or ‘Lessons learnt’ meetings should be held after any incident to highlight any weaknesses or failings so that rectification measures can be introduced accordingly. Likewise, should there be any good practices or solutions highlighted during the incident, then these should also be captured as they may be used in other areas of the organisation.
Now our example has been correctly identified, treated and business has returned to normal it is imperative that an incident ‘wash up’ meeting takes place to clearly identify those areas for improvement and those that performed well. The correct allocation of resources during the initial stages of the incident to address what was deemed to be initially a minor incident, resulted minimal impact to not only business outputs, but also to customers or third party suppliers. The findings of the ‘wash up ‘ meeting should be correctly recorded and analysed for any trends or patterns that may indicate a weakness in security. In this instance the mail server’s spam filters may have been incorrectly configured or not updated resulting in a vulnerability being exploited.
Any incident management plan should be suitably tested and its effectiveness evaluated with any updates/amendments implemented accordingly. It would be prudent to exercise any incident management plan annually or when there is a change in the key functions of the organisation. It is also additionally recommended that all users are reminded of how to report incidents during any annual security awareness education or training.
As organisations become ever increasingly reliant on internet and IT services, it is imperative that an effective, appropriate and fully tested, Incident Management Procedure is embedded within the organisation. Failure to ensure this may result in an organisation struggling to deal with or recover from any kind of security incident.