Commentary

min read

The AWS mess: the why, and the fun people poked at it

Rich Atkinson

October 22, 2025

You probably don’t need to be in tech to know - or at least have felt - this week’s massive AWS outage.

AWS (Amazon Web Services) - a division of Amazon (AMZN) - is a massive suite of cloud computing services: computing power, storage and databases, predominantly for businesses, though also for individuals.

Instead of managing their own physical infrastructure, companies rent scalable resources from AWS.

To set the context, AWS is by far the most significant player in the cloud space with 30% market share, followed by Microsoft’s Azure (20%); with Google Cloud trailing at 13%.

2010 was roughly $500M.
2023 was $90B

AWS revenue by year pic.twitter.com/sS1gx68NEi
— Elad Gil (@eladgil) July 25, 2024

The outage was for the most mundane of reasons - something I will come to - though the impact was enormous.

Apple Music, Zoom, Pinterest, Reddit, The New York Times, Adobe Creative Cloud and our own Canva significantly impacted.

Actually, that’s not fair: let’s do a roll call:

Social & Communications: Snapchat, Slack, Discord, Signal, Pinterest: chats and shares stalled. SIGNAL!

I hate to agree with elon but signal being down at the same time AWS crashes is giving spooky and not in the cute way I usually prefer halloweek to be
— Taania Rashid Khan 🕊️ (@taaniakhan) October 20, 2025

Gaming: Fortnite, Roblox, Epic Games Store, VRChat, Steam: millions kicked mid-game.

Finance & Shopping: Venmo (transactions frozen), Coinbase, Robinhood (crypto dipped 1-2%), Chime, Capital One, Instacart, Grubhub: payments and orders halted.

Work & Productivity: Canva, Asana, Jira Software, Adobe Creative Cloud: workflows ground to a halt.

Entertainment & Streaming: Prime Video, Amazon Music, Hulu, Disney+, Max, Tidal, IMDb, Roku: buffering and blackouts everywhere.

Amazon Ecosystem: AWS itself, Amazon shopping: core services tanked.

And that’s only half the rollcall.

Famously, Twitter (I won’t call it X) did not go down, though that is because since 2023, they’re on their own infrastructure, something Elon truly boasted about as only Elon can. (In fairness, I got most of my updates on the situation from Twitter, so boast I guess!)

😂 https://t.co/DyhSuRj9Gc
— Elon Musk (@elonmusk) October 20, 2025

According to Ookla, +11m reports of impact were recorded.

I’m yet to find a credible report on the cost of such outages (AWS also went down in 2021 and 2023), though you’d have to figure we’re talking hundreds of millions if not billions in lost revenue.

We’re talking financial transactions, emergency responses, supply chains, bookings, sales; everything.

pic.twitter.com/5ZN8Aqmfun
— Sam Lambert (@isamlambert) October 21, 2025

To be clear, I am an enormous fan of AWS and its infrastructure.

I am not being critical of it.

I do - on one hand - get the call for greater scrutiny - for such a concentration of our interconnecting economy. One mistake and a big part of the Internet collapses.

With AWS, the barriers to competition and innovation are significant.

Though I also understand how the Internet is built and that this is where we are at.

I’m not going to delve into whether the situation is right or wrong - clearly it’s fallable and tech concentration is a problem on many levels, and take from that what you want; though I am going to touch on the why this happened quickly and then my favourite part: the internet’s brilliant quips in light of the situation.

‍

Why did AWS fail

I mentioned that the failure was for the most of mundane reasons.

And it truly was.

Mundane as it was, a significant chunk of the Internet just stopped working, and so it’s worth diving into why.

This wasn’t the work of hackers.

It was a DNS glitch in a single data centre (US-EAST-1) in Northern Virginia that cascaded into global chaos.

DNS is about as fundamental to the Internet as it gets. It’s the roadways, traffic lights and roundabouts that route traffic. In simple terms, it’s why we have IP addresses.

From experience, if it isn’t a hack, it’s almost always DNS.

At AWS, an internal monitoring system for network load balancers shat itself: inaccurate name-to-IP mappings corrupted the entire DNS chain and the system fell like dominos.

The concentration of US-EAST-1 is significant and anyone that uses AWS likely knowns that.

It’s the mothership of AWS.

Even if your AWS instance is in Sydney, it relies on US-EAST-1 in realtime.

The mothership goes down, we essentially all go down. (Multi-region backup aside, something I won’t bore you with, though you should be multi-homed on at least two clouds, not just geographically redundant - allowing yourself to have one point of failure.)

‍

The internet has fun because… well… we had nothing else to do

I had plenty to go on, though the AWS downtime was also painful for us. Our clients were fine, though we rely on AWS, and their outage affected our work.

It did, however, give me a chance to keep a beat on what was happening on the ground, and it was as humorous as enjoyable to watch.

My favourite story was of Eight Sleep.

They’re a manufacturer of quite expensive beds that position you in the ideal position individually, allowing you to heat or cool the bed to your specification. And frankly, who knows what else?

Their beds rely on AWS. Exclusively.

Their CEO, Matteo Franceschetti, was quick to Twitter, apologising and promising remediation.

The AWS outage has impacted some of our users since last night, disrupting their sleep. That is not the experience we want to provide and I want to apologize for it.

We are taking two main actions:
1) We are restoring all the features as AWS comes back. All devices are currently…
— Matteo Franceschetti (@m_franceschetti) October 20, 2025

People didn’t buy it:

how did you ever ship this without local control? i have to understand how, as the leader, you said ok to this in the first place? https://t.co/aZMTYoGcz8
— ThePrimeagen (@ThePrimeagen) October 22, 2025

What the AWS outage pointed out, however, was that this was a bed that sends 16GB of monthly data (!) for sleep:

And it wasn’t just that beds stalled. They went ‘haywire’ without AWS.

Like sleeping sitting up. Being overheated. (A premium bed that sells for thousands.) (BYO fish tank cooler.)

this got a lot of attention wow

apparently the 8sleep outage was worse than i had thought, not just breaking cooling, but people had beds go haywire, randomly heat up, get stuck in various positions

also learned 8sleep bed's telemetry is 16GBdata/mo! app has friend+GPS features https://t.co/eiLvY0RvWc pic.twitter.com/uNotf0eVTS
— near (@nearcyan) October 21, 2025

1969: we can send people to the moon with 4kb of ram

2025: a cloud outage means my smart bed won’t cool down
— sophie (@netcapgirl) October 20, 2025

Even water purifiers went kaput.

when your smart water purifier won’t give you drinking water because aws is down pic.twitter.com/bdKxhIFXSG
— Indra (@IndraVahan) October 22, 2025

in America when someone else's computer breaks you can't use your washing machine. pic.twitter.com/EfUxW6xOpM
— ᐱ ᑎ ᑐ ᒋ ᕮ ᒍ (@Andr3jH) October 20, 2025

Plenty of IT admins had fun.

Evening update: AWS is back online.

I didn’t write a single command or touch a single server.

But I announced “It should be working now” in a confident tone, and everyone reacted like I had personally fixed the cloud.

People thanked me for “getting us back up.”

One person…
— IT Unprofessional (@it_unprofession) October 21, 2025

And there is an update from AWS itself, which is amusing.

James Hamilton famously ‘runs’ AWS from an engineering perspective. He lives on a custom yacht.

According to this AWS update, in four minutes he solved the issue after making landfall.

wow the last AWS status update is incredible https://t.co/vbUy2F5Rgg pic.twitter.com/IP81iybSC1
— tuxedo sam (@NotTuxedoSam) October 20, 2025

Not sure you could make this up.

Though the internet still had fun.

Today is my first day at AWS.

I noticed a small bug in DynamoDB clustering implementation and I think I fixed it.

Shipped to prod already.

Going to make a coffee and will check back if everything is working.
— Sandi Slonjšak (@sandislonjsak) October 20, 2025

Afternoon update: AWS is still down.

I haven’t fixed a single thing today.

But everyone keeps thanking me for the “constant updates.”

All I’ve done is refresh the status page, squint at it like I understand, and sigh deeply.

People keep walking by my desk just to see if I…
— IT Unprofessional (@it_unprofession) October 20, 2025

There was an AWS outage this morning.

No one could work, so we decided to do a team bonding activity to pass the time.

We asked our Product Manager to do a trust fall, and the developers were supposed to catch him.

The developers did not catch him.

He’s now concussed and…
— inhuman resources (@inhumandept_vp) October 20, 2025

AWS went down this morning.

Everyone panicked.

Marketing couldn’t access the website.

Finance couldn’t open their dashboards.

I told everyone to stay calm — there was nothing we could do.

For the first time all year, every ticket became “blocked by external vendor.”

My…
— IT Unprofessional (@it_unprofession) October 20, 2025

You can either laugh or cry. It was a thing, it is a thing, it’ll be a thing.

This is complexity at a level that one would need to write a book just to capture a glimpse.

There is redundancy for such events, and that is relatively easy. Though if you’re all on the deck whilst the orchestra plays, have some fun at the same time.

Share this post