The Schwartz
Cloud Report

Blog archive

Microsoft Explains Recent Cloud Outage

Wondering what caused the outage that brought down some of Microsoft's cloud services two weeks ago? While Microsoft attributed it to a DNS error, that was all the company was saying at the time.

The three-hour outage occurred on the evening of Sept. 8 and affected Windows Live services such as SkyDrive, MSN and Hotmail for three hours. Also affected was Microsoft's Office 365 service.

Arthur de Haan, VP of Windows Live test and service engineering at Microsoft, elaborated on the incident in a blog post on Tuesday night, explaining a corrupt file in the company's DNS service was to blame.

Microsoft was in the process of updating a tool that helps balance network traffic and the update went awry, he noted. Consequently, the configurations were corrupted, resulting in the outage, he said.

"The file corruption was a result of two rare conditions occurring at the same time," de Haan said.  "The first condition is related to how the load balancing devices in the DNS service respond to a malformed input string (i.e., the software was unable to parse an incorrectly constructed line in the configuration file). The second condition was related to how the configuration is synchronized across the DNS service to ensure all client requests return the same response regardless of the connection location of the client. Each of these conditions was tracked to the networking device firmware used in the Microsoft DNS service."

He said Microsoft intends to further harden the DNS service to by providing greater redundancy and failover capability.

Posted by Jeffrey Schwartz on September 21, 2011


Featured

  • Microsoft Appoints Althoff as New CEO for Commercial Business

    Microsoft CEO and chairman Satya Nadella on Wednesday announced the promotion of Judson Althoff to CEO of the company's commercial business, presenting the move as a response to the dramatic industrywide shifts caused by AI.

  • Broadcom Revamps VMware Partner Program Again

    Broadcom recently announced a significant update regarding its VMware Cloud Service Provider (VCSP) program, coinciding with the release of VMware Cloud Foundation (VCF) 9.0, a key component in Broadcom’s private cloud strategy.

  • Closeup of the new Copilot keyboard key

    Microsoft Updates Copilot To Add Context-Sensitive Agents to Teams, SharePoint

    Microsoft has rolled out a new public preview for collaborative "always on" agents in Microsoft 365 Copilot, bringing enhanced, context-aware tools into Teams channels, meetings, SharePoint sites, Planner workstreams and Viva Engage communities.

  • Windows 365 Cloud Apps Now Available for Public Preview

    Microsoft announced this week that Windows 365 Cloud Apps are now available for public preview. This aims to allow IT administrators to stream individual Windows applications from the cloud, removing the need to assign Cloud PCs to every user.