Lessons from the Facebook Outage -- Redmond Channel Partner

Guest Blogs

What Partners and Businesses Can Learn from the Facebook Outage

I got the first alert at 11:40 a.m. EST on Oct. 4 that there were problems with Facebook. As it's not business-critical for me, I didn't pay much attention -- but I did get puzzled when I couldn't connect to WhatsApp, as that is indeed a critical tool for me to interact with my different teams around the globe.

We've now learned that the outage lasted for six hours and involved not just Facebook but also services owned by it, like Instagram, Messenger, WhatsApp and Oculus VR. This was a costly outage for every business that depends on these services, and it shows how business-critical these social media resources have become.

Having led a large multinational hosting business, I know that sometimes problems occur that affect uptime. And any CEO in the hosting or managed services business knows that such incidents can have a big impact on reputation. Most outages aren't as big as the one that affected Facebook this week, but sometimes they are devastating.

Some problems should be expected, and you can take reasonable efforts to prepare for them. According to Facebook, the outage originated from an upgrade of routers. The ensuing problems shouldn't have been a surprise for anyone who works with infrastructure.

What is a surprise is that so much was connected to these routers. Not only did all of those customer-facing services go down, but Facebook's own e-mail system and a bunch of other internal systems -- including the entrance to the Facebook office building -- stopped working. To put it mildly, it looks like Facebook made the mistake of putting all of its eggs in the same basket and not following best practices for an enterprise-class online infrastructure.

Here's some advice not only for Facebook, but for everyone -- including partners -- running and managing business-critical infrastructure:

Segment your infrastructure so that a problem doesn't spread across your whole environment. Your administrative network should be separated from the network where your customer-facing systems reside. Even if you're not as big as Facebook, separate your different services into several networks. This will also help security, as it will make it much harder for attackers to bring your entire environment down.
Plan your upgrade. Make sure that it has been thoroughly analyzed and vetted. The higher its potential to impact business, the more you should plan and analyze prior to the actual upgrade taking place. Make sure that you have a decent change-management process in place.
Never upgrade everything if you can avoid it. Simulate the upgrade in a test environment, then start the upgrade with something less business-critical than a system that is used by 3.5 billion users. The "big bang" model of upgrades fails way too often.
Make sure that you know how to roll back an upgrade quickly and safely. Learn the right procedures for how to make it happen.
Rehearse frequently so you know what to do when something goes wrong. It's like a fire drill; you should have procedures and protocols to follow.
When all of your services are up again, make sure to create a written incident report and discuss the findings inside your organization. This is how my old company learned from past errors. Our mantra was that the same problem should never happen again.

Hope this will help you to prepare for the unexpected.

Posted by Per Werngren on October 05, 2021

Featured

Pentagon Broadens AI Partnerships With Microsoft, Amazon, Nvidia, Oracle and Others

The U.S. Department of Defense has extended agreements with major technology vendors, including Microsoft, Amazon, Nvidia, Oracle and Reflection AI.
Microsoft Pushes AI Governance With Launch of Agent 365 and New E7 Suite

Microsoft has officially launched Agent 365, while also introducing its new Microsoft 365 E7 subscription bundle.
Microsoft Clocks $83B in Revenue for Q3 Thanks to Cloud, Copilot

Microsoft reported strong third-quarter fiscal 2026 results, once again fueled by demand for cloud and AI services.
Microsoft and OpenAI Rework Alliance as AI Stakes Escalate

Microsoft and OpenAI are adjusting the terms of their high-profile partnership, signaling a shift in how the two companies will collaborate.

Featured RCP Partners

Automox
- Elite
Impact Networking
- Elite

Want More? Check Out Our Full Directory

RCP Update

Email Address*Country*

Please type the letters/numbers you see above.

Guest Blogs

What Partners and Businesses Can Learn from the Facebook Outage

Featured

Pentagon Broadens AI Partnerships With Microsoft, Amazon, Nvidia, Oracle and Others

Microsoft Pushes AI Governance With Launch of Agent 365 and New E7 Suite

Microsoft Clocks $83B in Revenue for Q3 Thanks to Cloud, Copilot

Microsoft and OpenAI Rework Alliance as AI Stakes Escalate

Microsoft Pushes AI Governance With Launch of Agent 365 and New E7 Suite

Aptean Introduces AppCentral: AI Platform and Agents for On-Premises Business Central Users

AvePoint Expands AI and Multicloud Capabilities Across Confidence Platform

Pentagon Broadens AI Partnerships With Microsoft, Amazon, Nvidia, Oracle and Others

Cayosoft Launches Microsoft Migration Services for Identity Modernization

Featured RCP Partners

Microsoft Pushes AI Governance With Launch of Agent 365 and New E7 Suite

Aptean Introduces AppCentral: AI Platform and Agents for On-Premises Business Central Users

AvePoint Expands AI and Multicloud Capabilities Across Confidence Platform

Pentagon Broadens AI Partnerships With Microsoft, Amazon, Nvidia, Oracle and Others

Cayosoft Launches Microsoft Migration Services for Identity Modernization

Partner Guides

Partner's Guide to the Windows Server 2008 Deadline

Partner's Guide to Office 365 Security Costs

Partner's Guide to UCaaS

Partner's Guide to Starter Workloads in Azure

Partner's Guide to Microsoft's Fiscal Year 2019

FREE WEBCASTS FROM OUR SPONSORS

Tech Talk | AI vs. AI: How MSPs Can Defend Against the New Wave of Cyber Threats

Coffee Talk | Building a Complete M365 Data Protection Strategy: A Partner Playbook

Tech Talk | Inside Microsoft’s Support Services Designation: What Partners Must Prove—and How to Get There Faster

From AVD to Microsoft 365: Building a modern cloud practice for MSPs

FREE WHITE PAPERS FROM OUR SPONSORS

2025 State of Ransomware for MSPs

Demystifying Compliance Frameworks for MSPs

A strategic guide to multi-tenant operations for modern MSPs

Unified cloud management: a practical approach to scaling Microsoft 365 and Azure for MSPs