Amazon web servers are down

Artashes

Administrator
Staff member
Trouble is certainly affecting all sorts of companies across the Internet today. We know Trello, Wix, Instagram and IMDB are all affected, among many others.

And I just received an update from Flippa marketplace, which has been down for hours.

Hi everyone,

Today we, along with a large number of other internet sites and services, have been impacted by a major service degradation with Amazon Web Services, our infrastructure provider.

As a result the Flippa website has been down for the past few hours. We do not yet have a firm timeframe for when the website will be back up, however we are monitoring the situation closely and will keep you informed of any updates we receive.

We are also aware that a number of auctions were due to finish on Flippa during this period. We are working on a solution for those listings, with the goal of ensuring fairness for both our sellers and our buyers.

Thank you for your patience. If you have any questions, please contact us at support@flippa.com.

Best regards,
Tony Barrett
CEO, Flippa

I can imagine the stress some companies are going through. A downtime is one thing for a site like ours (we lose just normal posting activity, but nothing that can't be picked up from the moment of coming back), but when transactions are due during the downtime, then it becomes tricky. I am definitely curious how Flippa, in this instance, will handle such cases.
 
Yeah, a lot of businesses were disrupted yesterday. In a USA Today story, it said hundreds of thousands were affected. I don't doubt that at all. In a very connected world, it seems no one is immune from issues.
 
It affected a lot of WordPress plugins for our customers. The links to CSS etc were dead causing the websites load really slow due to the CSS files trying to be preloaded first.
 
http://www.usatoday.com/story/tech/...ud-service-goes-down-sites-scramble/98530914/

I spoke to an accounts manager at AWS and he told me they suffered a major DDOS attack and had to null some of their IPS

The DDoS to take down an Amazon region with all the protection / resources they'd have to prevent it would have to be record high!

Anyway, the official story is human error. A tech took down more servers that he was supposed to which resulted in Amazon not having enough capacity. Safeguards are being put in place to prevent that.
 
What a mess that was. A lot of people had issues. I didn't see many sites go completely down but a lot of them had issues with commands and pictures loading. Thankfully, didn't have to deal with it on my own sites.
 
So at the end of the day, it comes down to a TYPO.

An admin was supposed to take a group of servers offline, but instead they took off a massive quanity. It put those servers into a reboot mode on servers that hadn't been rebooted in several years. The startup procedure was so long it put extra strain on other machines, and that slowed and took those machines down.

According to Amazon, it was a preventable event. They plan to put some extra safeguards in place so the removal of servers from the network shouldn't happen like this again.
 
So at the end of the day, it comes down to a TYPO.

An admin was supposed to take a group of servers offline, but instead they took off a massive quanity. It put those servers into a reboot mode on servers that hadn't been rebooted in several years. The startup procedure was so long it put extra strain on other machines, and that slowed and took those machines down.

According to Amazon, it was a preventable event. They plan to put some extra safeguards in place so the removal of servers from the network shouldn't happen like this again.
Any experienced server administrator has, at one point in time or another, shut down or rebooted the wrong server...

That said ... I don't think I have to worry about touching this record set by this Amazon Admin haha.
 
So at the end of the day, it comes down to a TYPO.

An admin was supposed to take a group of servers offline, but instead they took off a massive quanity. It put those servers into a reboot mode on servers that hadn't been rebooted in several years. The startup procedure was so long it put extra strain on other machines, and that slowed and took those machines down.

According to Amazon, it was a preventable event. They plan to put some extra safeguards in place so the removal of servers from the network shouldn't happen like this again.

Mind boggling. For someone amazon's size, more than 1 tech should have to sign off on any shutdown of any size IMHO. Can't add more than a few seconds to read a report and initial it saying OK
 
Mind boggling. For someone amazon's size, more than 1 tech should have to sign off on any shutdown of any size IMHO. Can't add more than a few seconds to read a report and initial it saying OK

True, but say there was a report. And it was signed off (it was approved to shut down a segment), there's still just one monkey typing on a keyboard :)

I can't remember the name of the large datacenter (maybe The Planet), when they were testing their backup systems. They shut down a humidifier and a PDU, it created an excessive load on other systems, and they shut down. The Oil Generator either never kicked in, or didn't kick in fast enough. they had a total center shutdown and had to boot up in stages so it wouldn't trip the breakers again.

I'm nearly positive it was The Planet back when we had like 400 servers with them (many, many, many years ago!)
 
True, but say there was a report. And it was signed off (it was approved to shut down a segment), there's still just one monkey typing on a keyboard :)

I can't remember the name of the large datacenter (maybe The Planet), when they were testing their backup systems. They shut down a humidifier and a PDU, it created an excessive load on other systems, and they shut down. The Oil Generator either never kicked in, or didn't kick in fast enough. they had a total center shutdown and had to boot up in stages so it wouldn't trip the breakers again.

I'm nearly positive it was The Planet back when we had like 400 servers with them (many, many, many years ago!)

a company like amazon should use the same protocol as they do for national security.
either use a 2 key option, so to turn off any server they need 2 keys and each key held by a different person, so without both keys a server cannot be turned off
 
Back
Top