The Schwartz
Cloud Report

Blog archive

Amazon Reels from Lightning Strike

Amazon Web Services' woes continued on Tuesday as the cloud provider worked to recover from a bolt of lighting that caused power outages in its Dublin, Ireland datacenter over the weekend.

The lighting strike brought down Amazon's EC2 and RDS services, as well as Microsoft's Business Productivity Online Services. Microsoft's outage reportedly lasted several hours on Sunday and has since been restored, the company said on its Twitter feed.

But Amazon Tuesday was still trying to restore its Elastic Block Store (EBS) block storage volumes following an explosion and fire that made the company's backup generators unavailable. The company said on its Service Health Dashboard on Sunday:

"Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored. Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process. We've been able to restore EC2 instances without attached EBS volumes, as well as some EC2 instances with attached EBS volumes. We are in the process of installing additional capacity in order to support this process both by adding available capacity currently onsite and by moving capacity from other availability zones to the affected zone. While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed."

The company said some EC2 instances and EBS servers lost power before writes to their volumes were completed. "Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service. We will contact those customers with information about their recovery snapshot."

Vincent Partington, co-founder and CTO of XebiaLabs, was among those who received bad news from Amazon. "Just got an email from Amazon AWS saying they lost some of my data," Partington Tweeted. "I'm OK with an outage now and then but this really blows!"

Apparently, the problem extends beyond the lightning strike. Amazon discovered a bug in the software that cleans up unused snapshots. "During a recent run of this EBS software in the EU-West Region [Dublin], one or more blocks in a number of EBS snapshots were incorrectly deleted," the company said. "We've addressed the error in the EBS snapshot system to prevent it from recurring. We have now also disabled all of the snapshots that contain these missing blocks."

As of Tuesday morning, Amazon said it has delivered recovery snapshots for more than half of the volumes that were affected by the power outage. "We are continuing to make steady progress on creation and delivery of the remaining recovery snapshots," the company said.

If that wasn't enough, Amazon suffered a brief outage Monday night at its Northeast U.S. datacenter in Virginia. While only lasting about two hours, it affected the Web sites of customers including Foursquare, Netflix and Reddit. Amazon attributed that to connectivity issues between instances.

Posted by Jeffrey Schwartz on August 10, 2011


Featured