The Evolution of Service-Level Agreements

Brian Wood Blog

“There are no guarantees in life — except death and taxes.”

And data center SLAs (Service Level Agreements), of course.

Analyst Al Sadowski of 451 Research wrote the piece below on the evolution of SLAs.

Emphasis in red added by me.

Brian Wood, VP Marketing


It’s Time for Service-Level Agreement 3.0

Ask any buyer, and they will tell you that each service provider treats the service-level agreement (SLA) as something on a continuum between marketing fluff and the most important selection criteria when deciding on a provider. With the convergence of network services and cloud-based applications, now is the time for more meaningful SLAs.

What is an SLA?
Contractually, a service-level agreement is a pact negotiated between two parties, where one is the buyer and the other is the seller. The SLA may specify minimum or target levels of availability, performance, operation or other attributes of the service – even billing. SLAs allow customers to be informed on what to expect. In some contracts, penalties may be included in the case of SLA noncompliance. Typically, the agreement relates to the services the customer receives, and not how the service provider delivers that service.

In the beginning (SLA 1.0)
It is believed that SLAs originated with telecom providers. Therefore, it is not surprising to often see IP-based network performance measurements like latency and packet loss listed in the contract language for provider SLAs.

Latency, measured in milliseconds, calculates how long it takes an IP packet to go from the originating location to its destination, both one-way and round trip. Among the factors that cause this delay are distance and media type (e.g., fiber optics, copper) carrying the traffic, as well as the equipment processing the packets on either end.

Packet loss is most noticeable in video conferencing, online gaming and VoIP calls. Just as its name suggests, it measures the percentage of packets that fail to arrive on time. A congested network is often to blame.

The legalese defining a provider’s SLA could span several pages. A customer would be required to open a trouble ticket and manually request compensation for an SLA breach. A bill credit for a percentage of the monthly recurring service charge was often the remedy. Repairs outside of business hours had longer delays.

Enter the cloud (SLA 2.0)
As colocation, hosting and cloud-based services took hold, providers highlighted their service-level agreements as a way to differentiate. Rather than network measurements, metrics like availability and uptime were highlighted.

The amount of redundancy built into a datacenter facility for fiber connectivity, power distribution, cooling, security, or the provider’s architecture for servers, storage and routing would determine the availability SLA, often ranging between 99.0% and 100%. In some cases, the provider is factoring in business risk when putting a metric into the contract. As managed service providers started winning enterprise outsourcing opportunities, customers required assurances in addition to availability guarantees.

Understandably, enterprises want services to work without interruption, and if outages do occur, they want rapid restoration. The service management practice within the ITIL framework brought consistency in defining reliability and maintainability metrics. In order to track reliability, providers now measure mean time between failures (MTBF) or average time between failures. When there is a failure, buyers want to know how long it will take to respond to, resolve and recover from the problem. This maintainability is measured as mean time to restore service (MTRS). Other important service management measurements to consider are the number of users that can be served simultaneously, helpdesk response time for various classes of problems and the schedule for notification in advance of network changes that may affect users.

What’s next? (SLA 3.0)
Individually, latency, packet loss, jitter, availability and MTRS are no longer good enough. Pages of legal jargon and having to manually ask a provider for credits are too cumbersome. ‘Normal business hours’ no longer apply. Now that networks are application-aware, service providers need to update SLAs. It is time for SLA 3.0.

SLA 3.0 guarantees the entire end-to-end user experience – the application, components in the datacenter, the network delivering the service and helpdesk resources. In our real-time world, customers also need real-time performance reporting to measure against customer-specific benchmarks, and APIs allow a customer to integrate a provider’s service management capabilities into their own tools and portals. SLA contract language is simple, and when breaches do occur, credits are automatic.

Hosting provider SingleHop created a nine-point customer bill of rights in easy-to-understand language free from the typical heretofore- and whereas-laden service contracts. SingleHop techs have a custom Android app to track SLAs, and bonuses are paid based on an ability to meet the SLA. The company boasts 100% server uptime for the past two years, and says customers have cited the simple SLA as a differentiator. Edge Web Hosting also puts its money where its marketing is. The company has a 100% uptime SLA and offers a full-day credit for each 30 minutes of downtime.

Some specific network applications, like VoIP, have their own performance measurement. MOS (mean opinion score) has a scale of one to five, with five being excellent; it combines latency, packet loss, jitter and other factors to determine the minimum threshold for voice quality. Expect more applications to be rated based on similar MOS scales, and for more providers to guarantee a minimum score, especially network service providers (NSPs) that manage the network and outsourced applications.

BT is an example of an NSP at the forefront. Its goal is application SLAs backed with real technical data and baseline measurements. The company deploys the infrastructure and measures performance over a period of time before locking in a baseline. BT’s offerings are complete with fault diagnosis and automatic payouts. BT currently offers such a performance guarantee on a LAMP stack, for example.

Much like telecom providers battling one another to build faster network routes capable of handling more bandwidth, and datacenter providers racing to add more ‘nines’ of availability, we can expect another contest among NSPs to launch business applications on their networks, complete with SLA 3.0 features. Enterprises with more remote workers, global locations, online collaboration and SaaS requirements are increasingly looking to NSPs to provide complete offerings – SLAs can’t just be marketing fluff anymore.