Recent Posts

Amazon S3 Down

For those of you who don’t know, Amazon provides elastic storage and computing power through on its cloud of servers through two services – S3 and EC2. These services have been a huge deal for many websites, as they enable applications to handle dynamic traffic loads without having huge arrays of servers – companies can pay for what they use and no more thanks to Amazon’s enormous computing capacity.

Amazon S3 has been completely down for about 3 hours this morning – no small thing at all. Thousands of websites rely on S3 to serve a variety of assets from photos to music to video. In Indaba‘s case, we’ve been serving the widgets for our Mariah Carey Remix Contest through S3, which are now of course completely disabled until S3 comes back up.

Here’s the timeline according to Amazon’s Service Health Dashboard:

9:05 AM PDT We are currently experiencing elevated error rates with S3. We are investigating.
9:26 AM PDT We’re investigating an issue affecting requests. We’ll continue to post updates here.
9:48 AM PDT Just wanted to provide an update that we are currently pursuing several paths of corrective action.
10:12 AM PDT We are continuing to pursue corrective action.
10:32 AM PDT A quick update that we believe this is an issue with the communication between several Amazon S3 internal components. We do not have an ETA at this time but will continue to keep you updated.
11:01 AM PDT We’re currently in the process of testing a potential solution.
11:22 AM PDT Testing is still in progress. We’re working very hard to restore service to our customers.
11:45 AM PDT We are still in the process of testing a series of configuration changes aimed at bringing the service back online.
12:05 PM PDT We have now restored communication between a small subset of hosts. We are working on restoring internal communication across the rest of the fleet. Once communication is fully restored, then we will work to restore request processing.
12:25 PM PDT We have restored communication between additional hosts and are continuing this work across the rest of the fleet. Thank you for your continued patience.

Of course, Indaba isn’t the only site that’s been affecting – our good friends over at haven’t been able to serve certain assets, and Basecamp – the web-based project management software from 37 signals has also been unable to serve assets all morning.
I’m interested to see what happens tomorrow (Monday). S3 had similar problems in February, and outages of this size and duration really call into question the stability of the service. It’s not going to be fun over at Amazon tomorrow…(not that today was any better I’m sure).