The trouble with SLAs
Most organisations rely on Service Level Agreements (SLAs) with their IT service providers to ensure that they are receiving the service for which they are paying. While the idea of using SLAs as a service management mechanism is appealingly straight forward, the reality is somewhat more torturous
In practice, SLAs are too often a poor or ineffective tool because of the way in which they are defined and measured.
SLAs are designed to allow an active, two-way conversation between provider and consumer. It’s not just about the performance of the service provider, the SLAs also enable decisions to be made about getting the right balance between service cost and service performance.
Unfortunately there is a lamentable gulf between this and the way in which SLAs are selected and managed in many organisations.
The result: SLAs are too often irrelevant or inadequate and are responsible for as many disputes as agreements between service consumers and service providers.
To be useful, SLAs must be defined in terms of actual service to end-users and their ability to perform productive work. In this case the most effective means of measuring is fully automated end-user experience monitoring provided it is capable of supporting complex user interactions across a wide range of application and access technologies including web, mobile VDI and desktop client.
So what are the common problems with real-world SLAs?
Missing or incomplete SLAs
It’s surprising how many major organisations simply do not define application SLAs for all their key applications. Also a number of leading Software as a Service (SaaS) providers stubbornly refuse to publish SLAs.
Defining terms for SLAs is paramount not only for dealing with third party providers but as an internal management discipline between IT and the business for key applications.
SLAs relating to IT applications generally establish requirements for application availability, usually expressed as a target percentage of uptime. It’s important that SLAs relate to the ability of end-users to do productive work and not just to some incidental technology metrics that suit the provider or are easier to measure.
Many SLAs don’t include quantifiable application performance measures making it impossible to determine when performance is so degraded that the application is effectively unavailable.
SLAs are not new but that’s no reason to persist with antiquated approaches based on simple technology behaviour.
A classic example is defining an SLA in terms of the ability to login or how long a screen takes to display. This dates back to green screen systems of the 80s when this may have been a reasonable measure of an end-user’s ability to work with a system.
With modern systems, there may be very little correlation between the ability of a front end server to render a screen and the real end-user experience.
Measuring SLAs
If an SLA can’t be measured directly it’s of little use. This is a problem when SLAs are expressed in relative terms or with respect to some fuzzy historical precedent.
For example an SLA that sometimes appears as part of a major system upgrade program is - “no slower than the previous version”. In this situation you may think that you have an SLA but in fact you don’t, as past measurements may be difficult to apply. This makes holding a system integrator accountable for the success of the upgrade more challenging.
Cloud Infrastructure as a Service (IaaS) SLAs in particular seem prone to this malady but they are not alone. If you can’t understand an SLA, chances are you can’t measure it properly or relate it to the ability of end-users to be productive.
Some companies don’t have specific application and performance SLAs and attempt to repurpose some other criteria, such as disaster recovery priority classifications. The implicit assumption is that an application with a “gold” DR priority should also have the most stringent performance standards.
Reusing other classifications as an SLA is a mistake as they won’t necessarily have common business drivers nor provide the specific detail required to establish an effective measurement regime.
Even the most well-defined SLAs are useless for service assurance if they are not being properly measured. As SLA measures become more sophisticated and aligned with end-user experience and productivity, measurement regimes need to mature.
Using the wrong metrics
Unfortunately, many organisations do not have the means to perform more sophisticated monitoring of modern, end-user focussed SLAs and retreat to traditional but less useful technology metrics.
Most organisations will depend on a broad portfolio of applications that use a variety of delivery technologies. Even where efforts have been made to define a consistent SLA framework across the portfolio, it can be difficult to implement a consistent and reliable measurement capability across the different technologies.
The result can be a hotch potch of different tools and measurement techniques that give little confidence in what is being measured or leave some areas unmeasured.
Murray Andrews is the chief operating officer at REMASYS.