Gartner: AWS, HP have worst cloud SLAs

The award for the “worst SLA” of any major cloud provider goes to Amazon (NASDAQ: AMZN) Web Services, according to Gartner analyst Lydia Leong, but HP’s (NYSE: HPQ) public cloud SLA is not necessarily better. Both providers have strict requirements for users in architecting their cloud systems if they want the SLAs to apply when service is disrupted, reports Brandon Butler at Network World.

Amazon’s SLA gives enterprises heartburn,” Leong wrote in a recent blog post. “HP had the opportunity to do significantly better here, and hasn’t.”

In addition to the costly service architecture requirements, the SLAs for AWS and HP cloud services are unnecessarily complex and limited, she complained. For example, neither SLA covers block storage services.

For the AWS SLA to take effect, customers are required to build their systems so that applications run across a minimum of two of the provider’s data centers, known as availability zones. The HP SLA only goes into effect if all of its availability zones are down, which means customers have to build their applications to cover three or more availability zones.

The costs of running applications across more than one availability zone add up. What’s more, Leong wrote, the requirements add complexity to the systems. “Most people are reasonably familiar with the architectural patterns for two data centers; once you add a third and more, you’re further departing from people’s comfort zones, and all HP has to do is to decide they want to add another AZ in order to essentially force you to do another bit of storage replication if you want to have an SLA,” she wrote.

Because of the service architecture requirements, it isn’t likely that AWS or HP customers would be reimbursed for downtime in a meaningful way under the SLAs, Leong warned. However, other infrastructure-as-a-service providers offer considerably better SLAs.


Here is an updated post by Gartner analyst Lydia Leong based on new info from HP:

Some clarifications on HP’s SLA

I corresponded with some members of the HP cloud team in email, and then colleagues and I spoke with HP on the phone, after my last blog post called, “Cloud IaaS SLAs can be Meaningless“. HP provided some useful clarifications, which I’ll detail below, but I haven’t changed my fundamental opinion, although arguably the nuances make the HP SLA slightly better than the AWS SLA.

The most significant difference between the SLAs is that the HP’s SLA is intended to cover a single-instance failure, where you can’t replace that single instance; AWS requires that all of your instances in at least two AZs be unavailable. HP requires that you try to re-launch that instance in a different AZ, but a failure of that launch attempt in any of the other AZs in the region will be considered downtime. You do not need to be running in two AZs all the time in order to get the SLA; for the purposes of the SLA clause requiring two AZs, the launch attempt into a second AZ counts.

HP begins counting downtime when, post-instance-failure, you make the launch API call that is destined to fail — downtime begins to accrue 6 minutes after you make that unsuccessful API call. (To be clear, the clock starts when you issue the API call, not when the call has actually failed, from what I understand.) When the downtime clock stops is unclear, though — it stops when the customer has managed to successfully re-launch a replacement instance, but there’s no clarity regarding the customer’s responsibility for retry intervals.

(In discussion with HP, I raised the issue of this potentially resulting in customers hammering the control plane with requests in mass outages, along with intervals when the control plane might have degraded response and some calls succeed while others fail, etc. — i.e., the unclear determination of when downtime ends, and whether customers trying to fulfill SLA responsibilities contribute to making an outage worse. HP was unable to provide a clear answer to this, other than to discuss future plans for greater monitoring transparency, and automation.)

I’ve read an awful lot of SLAs over the years — cloud IaaS SLAs, as well as SLAs for a whole bunch of other types of services, cloud and non-cloud. The best SLAs are plain-language comprehensible. The best don’t even need examples for illustration, although it can be useful to illustrate anything more complicated. Both HP and AWS sin in this regard, and frankly, many providers who have good SLAs still force you through a tangle of verbiage to figure out what they intend. Moreover, most customers are fundamentally interested in solution SLAs — “is my stuff working”, regardless of what elements have failed. Even in the world of cloud-native architecture, this matters — one just has to look at the impact of EBS and ELB issues in previous AWS outages to see why.