How to Succeed at Incident Response Metrics
Establishing a baseline of what information you need is an essential first step.
Collecting metrics for your incident response team is as critical as training your team. Without metrics, it is nearly impossible to determine how effective your team is, if your technology investments are performing as expected, and how satisfied management is with the current results. The hardest part is getting started, but it’s the beginning of really great data analytics.
Establish a baseline of what information you need to answer the questions that are most important to your team. Below are the most basic metrics all teams should keep, but you may have others you want to track to help with making a business case for control.
- Time from compromise to discovery (dwell time)
- Time from alarm to triage
- Time to close
- Incident classification
- Detection method
When determining how to classify your incident into categories, it is best to use an already developed taxonomy instead of creating your own. By doing this, you will be able to easily compare yourself with peers and published reports. The VERIS framework is one of the more popular choices, and the Verizon Data Breach Investigations Report uses this framework, which allows you to compare yourself to this report yearly. Here is a great list of the most common taxonomies. Another thing to consider when choosing a taxonomy is if your peers in your industry have a preferred choice.
This can be tricky because your initial choice might be to expand your current tracking to facilitate this new data. If you are already using a tool like Request Tracker Incident Response, by creating custom fields, you can track these data points. And for the basic data points, this works fine. Most people seem to run into issues when they start trying to collect most, if not all, of a complex taxonomy. Depending on how flexible your tools are and how complex making these changes would be, it might make more sense to use a survey type of tool for entering and mining just the metadata of your incidents and keep your case note in your current tracking system.
Once this data is available, you can start measuring how changes affect your environment. It is important to track when processes change, new tools are implemented, what personal changes happen, etc. All of these things will make the number fluctuate. You should expect an additional junior staff hire to lengthen time-to-close initially, but you should see your dwell time shorten because you have more people looking for incidents. Some of the more useful questions I try to answer with the data are below.
- What is your average/median time for detection?
- How much time are you saving by implementing the new process?
- What types of incidents are taking the longest? Do you need more training or better tools?
Having this baseline in place for the past few years has helped me draw some very useful conclusions. If management isn’t satisfied with how quickly you are finding incidents, you should have numbers to support your recommendations to address the issues. If you spent $500,000 on a new tool but it only leads to detecting 20 incidents, is this something you should continue to support? This data has been very powerful for my team in making lots of business cases for change, and without it, I believe we would not be nearly as effective or efficient.
Note: Tom Webb will be giving a talk on this topic at an upcoming SANS event in Washington, D.C., in July.
Tom Webb is an incident handler for the SANS Internet Storm Center. He currently leads a team of six that perform incident response and forensics investigations and vulnerability management. He has 12 years dedicated to security. Tom holds several certs, including the GSE and … View Full Bio