Microsoft Gives Postmortem of Lync, Exchange Outages -- Redmond Channel Partner

Microsoft Gives Postmortem of Lync, Exchange Outages

By Kurt Mackie
June 27, 2014

Microsoft on Thursday issued an explanation for two separate Office 365 service outages that occurred this week.

In a Microsoft forum post, Rajesh Jha, corporate vice president for Office 365 engineering, said that only Microsoft's North American datacenters were affected by Monday's Lync Online outage and Tuesday's Exchange Online outage, and that the problems causing the outages have since been fixed.

With regard to the Lync Online problem, some users in North America were affected and couldn't log into the service. Microsoft fixed that specific log-in problem "in minutes," Jha said, but that "the ensuing traffic spike caused several network elements to get overloaded, resulting in some of our customers being unable to access Lync functionality for an extended duration." That extended duration appears to have been a good part of the working day on June 23, according to a chronicle kept by veteran Microsoft reporter Mary Jo Foley.

The Exchange Online outage also seems to have been a small problem that just escalated after being detected. Jha explained that a directory partition stopped responding to authentication requests. That problem caused "a small set of customers to lose email access." However, the problem somehow affected Microsoft's broader e-mail traffic flow. Many Exchange Online users reported not being able to send or receive e-mail. Jha said that the initial Exchange Online failure led to an "unexpected issue":

Unfortunately, the nature of this failure led to an unexpected issue in the broader mail delivery system due to a previously unknown code flaw leading to mail flow delays for a larger set of customers. Our recovery strategy was two pronged: 1) We partitioned the mail delivery system away from the failed directory partition and 2) directly addressed the root cause for the failed directory partition. In addition to fixing the root cause trigger, we are working on further layers of hardening for this pattern.

The Exchange Online problem persisted through most of the day on June 24. Jha also noted that the Service Health Dashboard, which provides Office 365 service uptime reports to subscribers, had a problem with its "publishing process, meaning not all impacted customers were notified in a timely way." He said that the problem with the Service Health Dashboard has "since been addressed."

Microsoft plans to provide more details about the outages to its customers via a "post-incident report," which will appear in the Service Health Dashboard, Jha said. Microsoft doesn't have a publicly accessible portal showing its Office 365 service health, and so much of the news about the outages on Monday and Tuesday were initially relayed through Twitter posts.

Microsoft offers a "three nines" or 99.9 percent uptime service level agreement as part of its Office 365 business plans. If Microsoft fails to meet a 99.9 percent uptime each month, then the subscriber may be eligible to get a service credit. However, the subscriber has to file with Microsoft to get the credit. The service credit is calculated as a percentage of the monthly service fees that gets returned to the customer, depending on the degradation of service uptime. Microsoft shows those uptime percentages and corresponding service credits in the following table:

Monthly Uptime	Service Credit
< 99.9%	25%
< 99%	50%
< 95%	100%

Service credit percentages based on monthly Office 365 uptime. Source: Microsoft's "Service Level Agreement for Microsoft Online Services" document.

It's estimated that a 99.9 percent uptime translates to experiencing about 43 minutes of downtime per month, or about eight hours of downtime per year. Microsoft's outages on Monday and Tuesday lasted perhaps six hours and nine hours, respectively, according to press reports.

About the Author

Kurt Mackie is senior news producer for 1105 Media's Converge360 group.

Featured

Microsoft Tops $90B Quarterly Revenue as Azure Hits $100B Annual Milestone

Microsoft reported record fourth-quarter and full-year fiscal 2026 results on Wednesday, July 29, driven by continued strength in Azure.
Microsoft Targets AI-Era Threats with Project Perception and Stronger Agent Security

Microsoft is rolling out new security capabilities intended to help organizations identify and stop malicious or risky agent behavior in real time.
Microsoft Broadens Defender Experts Portfolio with New Threat Intelligence Service and Expanded Hybrid Cloud Protection

Microsoft is extending its managed security services by adding a new Defender Experts Threat Intelligence offering.
Microsoft, 3M Expand Alliance to Pair AI Infrastructure with Enterprise Transformation

Microsoft and 3M are deepening their relationship through a new strategic agreement that addresses both the technology powering next-generation AI and the way large enterprises apply AI across their operations.

Featured RCP Partners

Impact Networking
- Elite
Automox
- Elite

Want More? Check Out Our Full Directory

RCP Update

Email Address*Country*

Please type the letters/numbers you see above.