Lessons from the Facebook Outage -- Redmond Channel Partner

Guest Blogs

What Partners and Businesses Can Learn from the Facebook Outage

I got the first alert at 11:40 a.m. EST on Oct. 4 that there were problems with Facebook. As it's not business-critical for me, I didn't pay much attention -- but I did get puzzled when I couldn't connect to WhatsApp, as that is indeed a critical tool for me to interact with my different teams around the globe.

We've now learned that the outage lasted for six hours and involved not just Facebook but also services owned by it, like Instagram, Messenger, WhatsApp and Oculus VR. This was a costly outage for every business that depends on these services, and it shows how business-critical these social media resources have become.

Having led a large multinational hosting business, I know that sometimes problems occur that affect uptime. And any CEO in the hosting or managed services business knows that such incidents can have a big impact on reputation. Most outages aren't as big as the one that affected Facebook this week, but sometimes they are devastating.

Some problems should be expected, and you can take reasonable efforts to prepare for them. According to Facebook, the outage originated from an upgrade of routers. The ensuing problems shouldn't have been a surprise for anyone who works with infrastructure.

What is a surprise is that so much was connected to these routers. Not only did all of those customer-facing services go down, but Facebook's own e-mail system and a bunch of other internal systems -- including the entrance to the Facebook office building -- stopped working. To put it mildly, it looks like Facebook made the mistake of putting all of its eggs in the same basket and not following best practices for an enterprise-class online infrastructure.

Here's some advice not only for Facebook, but for everyone -- including partners -- running and managing business-critical infrastructure:

Segment your infrastructure so that a problem doesn't spread across your whole environment. Your administrative network should be separated from the network where your customer-facing systems reside. Even if you're not as big as Facebook, separate your different services into several networks. This will also help security, as it will make it much harder for attackers to bring your entire environment down.
Plan your upgrade. Make sure that it has been thoroughly analyzed and vetted. The higher its potential to impact business, the more you should plan and analyze prior to the actual upgrade taking place. Make sure that you have a decent change-management process in place.
Never upgrade everything if you can avoid it. Simulate the upgrade in a test environment, then start the upgrade with something less business-critical than a system that is used by 3.5 billion users. The "big bang" model of upgrades fails way too often.
Make sure that you know how to roll back an upgrade quickly and safely. Learn the right procedures for how to make it happen.
Rehearse frequently so you know what to do when something goes wrong. It's like a fire drill; you should have procedures and protocols to follow.
When all of your services are up again, make sure to create a written incident report and discuss the findings inside your organization. This is how my old company learned from past errors. Our mantra was that the same problem should never happen again.

Hope this will help you to prepare for the unexpected.

Posted by Per Werngren on October 05, 2021

Free Webcast! Learn about Password Management Best Practices

Featured

Microsoft CSPs To Start Selling Windows 10 ESU this Fall

Organizations that want to extend the life of their Windows 10 PCs can begin buying extension plans from Microsoft's Cloud Solution Provider (CSP) partners on Sept. 1.
2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

Here's your guide to all the IT training sessions, partner meet-ups and annual Microsoft conferences you won't want to miss.
Microsoft Gives Security Partners First Dibs at New Windows Security Platform

Microsoft is readying a new "Windows endpoint security platform" as part of its Windows Resiliency Initiative (WRI).
Microsoft Cements Lead in Increasingly AI-Centric Security Analytics

Forrester has singled out Microsoft as the leading cloud hyperscaler in the increasingly AI-driven field of enterprise security analytics.

RCP Update

Email Address*Country*

Please type the letters/numbers you see above.

Guest Blogs

What Partners and Businesses Can Learn from the Facebook Outage

Featured

Microsoft CSPs To Start Selling Windows 10 ESU this Fall

2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

Microsoft Gives Security Partners First Dibs at New Windows Security Platform

Microsoft Cements Lead in Increasingly AI-Centric Security Analytics

2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

TCS Expands Microsoft Integration To Drive AI, Cloud Adoption

Microsoft CSPs To Start Selling Windows 10 ESU this Fall

AvePoint Boosts 'Elements' Suite with New Tools for Security MSPs

Channel Veteran Dan Wensley Named GTIA CEO

2025 Microsoft Conference Calendar: For Partners, IT Pros and Developers

TCS Expands Microsoft Integration To Drive AI, Cloud Adoption

Microsoft CSPs To Start Selling Windows 10 ESU this Fall

AvePoint Boosts 'Elements' Suite with New Tools for Security MSPs

Channel Veteran Dan Wensley Named GTIA CEO

Partner Guides

Partner's Guide to the Windows Server 2008 Deadline

Partner's Guide to Office 365 Security Costs

Partner's Guide to UCaaS

Partner's Guide to Starter Workloads in Azure

Partner's Guide to Microsoft's Fiscal Year 2019

FREE WEBCASTS FROM OUR SPONSORS

Seamless Transitions: Migrating Your VMware Workloads to Azure

Azure Virtual Desktop: The Smart Choice for Remote Work Success

Veeam Data Cloud for Microsoft 365

Mastering Cost Management in Azure

FREE WHITE PAPERS FROM OUR SPONSORS

The Easy Button eBook: Simplicity of SaaS-Based Backup for Microsoft 365

Unlock the benefits of Microsoft Azure partnership

How Retailers Can Overcome 7 Common Tax Compliance Challenges

Tax Challenges Affecting the Manufacturing Industry