7.01.2008

Measuring Availability

This first post has to deal with what initially brought me to ITIL a couple of years ago: Availability.

You have defined your Vital Business Functions, your Availability plan has been finalized, your SLAs are signed: now is showtime for some maths!
Let's start easy with the basics:
  • AST = Agreed Service Time
  • DT = Downtime
  • TAM = Total Available Minutes for all services delivered
  • TNI = Total Number of Incidents
  • TNIIC = Total Number of Incidents Impacting Customers
  • TUM = Total Unavailable Minutes for all services delivered
availability = (AST - DT) / AST * 100
resilience = 1 - (TNIIC / TNI)
reliability = 1 - (TUM / TAM)

Other metrics to track:
- how much unplanned costs you spent of maintaining needed availability?
- how much of your SW/HW infrastructure is supported by external vendors?
- how vulnerable are you you to security threats?
(from Incident Management:)
- what is your average response time on a customer impacting incident?
- what is your average resolution on a customer impacting incident?

Note that a lot of metrics are not related to systems uptime but to services uptime, which measurement is a challenge by itself!