The Schwartz
Cloud Report

Blog archive

Amazon Reels from Lightning Strike

Amazon Web Services' woes continued on Tuesday as the cloud provider worked to recover from a bolt of lighting that caused power outages in its Dublin, Ireland datacenter over the weekend.

The lighting strike brought down Amazon's EC2 and RDS services, as well as Microsoft's Business Productivity Online Services. Microsoft's outage reportedly lasted several hours on Sunday and has since been restored, the company said on its Twitter feed.

But Amazon Tuesday was still trying to restore its Elastic Block Store (EBS) block storage volumes following an explosion and fire that made the company's backup generators unavailable. The company said on its Service Health Dashboard on Sunday:

"Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored. Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process. We've been able to restore EC2 instances without attached EBS volumes, as well as some EC2 instances with attached EBS volumes. We are in the process of installing additional capacity in order to support this process both by adding available capacity currently onsite and by moving capacity from other availability zones to the affected zone. While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed."

The company said some EC2 instances and EBS servers lost power before writes to their volumes were completed. "Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service. We will contact those customers with information about their recovery snapshot."

Vincent Partington, co-founder and CTO of XebiaLabs, was among those who received bad news from Amazon. "Just got an email from Amazon AWS saying they lost some of my data," Partington Tweeted. "I'm OK with an outage now and then but this really blows!"

Apparently, the problem extends beyond the lightning strike. Amazon discovered a bug in the software that cleans up unused snapshots. "During a recent run of this EBS software in the EU-West Region [Dublin], one or more blocks in a number of EBS snapshots were incorrectly deleted," the company said. "We've addressed the error in the EBS snapshot system to prevent it from recurring. We have now also disabled all of the snapshots that contain these missing blocks."

As of Tuesday morning, Amazon said it has delivered recovery snapshots for more than half of the volumes that were affected by the power outage. "We are continuing to make steady progress on creation and delivery of the remaining recovery snapshots," the company said.

If that wasn't enough, Amazon suffered a brief outage Monday night at its Northeast U.S. datacenter in Virginia. While only lasting about two hours, it affected the Web sites of customers including Foursquare, Netflix and Reddit. Amazon attributed that to connectivity issues between instances.

Posted by Jeffrey Schwartz on August 10, 2011


Featured

  • Microsoft Appoints Althoff as New CEO for Commercial Business

    Microsoft CEO and chairman Satya Nadella on Wednesday announced the promotion of Judson Althoff to CEO of the company's commercial business, presenting the move as a response to the dramatic industrywide shifts caused by AI.

  • Broadcom Revamps VMware Partner Program Again

    Broadcom recently announced a significant update regarding its VMware Cloud Service Provider (VCSP) program, coinciding with the release of VMware Cloud Foundation (VCF) 9.0, a key component in Broadcom’s private cloud strategy.

  • Closeup of the new Copilot keyboard key

    Microsoft Updates Copilot To Add Context-Sensitive Agents to Teams, SharePoint

    Microsoft has rolled out a new public preview for collaborative "always on" agents in Microsoft 365 Copilot, bringing enhanced, context-aware tools into Teams channels, meetings, SharePoint sites, Planner workstreams and Viva Engage communities.

  • Windows 365 Cloud Apps Now Available for Public Preview

    Microsoft announced this week that Windows 365 Cloud Apps are now available for public preview. This aims to allow IT administrators to stream individual Windows applications from the cloud, removing the need to assign Cloud PCs to every user.