Channeling the Cloud
A Lesson from the Amazon Outage: Speak Up
Besides dredging up concerns about data security in the cloud, the Amazon outage made it clear that when it comes to corporate damage control, silence isn't the best policy.
- By Jeffrey Schwartz
- June 01, 2011
- Go here to read more "Channeling the Cloud" columns by Jeffrey Schwartz.
The Amazon Web Services four-day outage in late April was a defining moment in the history of cloud computing -- not only for its impact, but for the way the company handled it, which can serve as a good lesson to any provider of services.
The widely reported outage at the Amazon Northern Virginia datacenter left a number of sites crippled for several days. However, the company acknowledged that .07 percent of the Elastic Block Storage (EBS) volumes wouldn't be fully recoverable. This was perhaps the most significant cloud outage to date, but certainly it was not the first time a Web service, network or datacenter went down.
"Every day, inside companies all over the world, there are technology outages," Rackspace Chief Strategy Officer Lew Moorman told The New York Times. "Each episode is smaller, but they add up to far more lost time, money and business."
As for the Amazon outage, he added: "We all have an interest in Amazon handling this well." Did Amazon handle this well? Let's presume the company did everything in its power to remedy the problem and get its customers back online.
The problem was that Amazon went dark from a communications perspective. Sure, it posted periodic updates on its Service Health Dashboard, but the company issued no other public statements on the situation as it was unfolding (though it was in direct communication with affected customers). Considering how visible Amazon technologists are on social media, including Twitter, a mere reference to the dashboard felt shallow.
"The fact that disaster is inevitable is why good communications skills are so crucial for any company to develop, and why Amazon's anemic public response to the outage made a bad situation far worse than it needed to be," noted PundIT analyst Charles King, in a research note.
I remember during the dot-com boom more than a decade ago when companies like Charles Schwab, E-Trade and eBay had highly visible outages that affected many thousands of customers. They took big PR hits for their lack of availability, but their Web businesses prospered nonetheless.
Though the Amazon outage will upgrade the discussion to the importance of resiliency and redundancy (those discussions were already happening), it seems highly unlikely that it will alter the move to cloud computing, even if it serves as a historic speed bump.
To its credit, Amazon acknowledged its mistakes. In a detailed post mortem issued four days after it resumed service, Amazon promised to improve communications in the future. "We would like our communications to be more frequent and contain more information," the company said. "We understand that during an outage, customers want to know as many details as possible about what's going on, how long it will take to fix, and what we're doing so that it doesn't happen again."
While the Amazon outage was a black eye for cloud computing, providers of all sizes -- including Amazon -- will undoubtedly learn from the mistakes that were made, both technical and procedural. Hopefully, that will include better communications moving forward.
Check out the Schwartz Cloud Report blog at RCPmag.com/CloudReport
Jeffrey Schwartz is editor of Redmond magazine and also covers cloud computing for Virtualization Review's Cloud Report. In addition, he writes the Channeling the Cloud column for Redmond Channel Partner. Follow him on Twitter @JeffreySchwartz.