Amazon provisioning 7 petabytes of storage per day

Guy Rosen did an interesting analysis of the number of virtual machines started in Amazon’s cloud. His findings were surprising, at least to me: according to his stats, nearly 50.000 EC2 instances are started daily. But it all depends on the way you look at it; it’s less than one new VM per second. And since I know of a number of companies using EC2 instances to deal with peaks in demand, many of these will be running only a short period of time.

But today I was looking at these numbers in another way: the huge amounts of storage for these VMs. The smallest EC2 instance has 160GB of local disk storage. That means nearly 8 petabytes of storage is provisioned each day in Amazon’s cloud; and possibly more. The 160GB is for a 32-bit system; users wanting a 64-bit machine start at either 350 or 850GB of storage, so the total amount of disk provisioned just for the EC2 machines is probably over 10 petabytes per day.

Petabox unit at the Internet Archive

Petabox unit at the Internet Archive - by gruntzooki

Photo by gruntzooki

There are also over 10.000 EBS volumes created per day; these are meant for persistent storage. That means EBS volumes won’t be recycled as often as EC2 virtual machines, and this represents a number that keeps growing. Assuming about half the EBS volumes are small tests, and the other half are for “real” storage, the second half are likely to be close to the maximum 1TB size on average. That’s another 5 petabytes of storage added daily!

Finally, there’s the S3 storage facility. As of august this year, there were 64 billion object stored in it. In march, there were 52 billion; that’s about 2 billion objects per month, or 67 million per day. Assuming the average object size for data stored in S3 is about a megabyte as estimated by Anil Gupta, the storage used by S3 is dwarfed by the EC2 and EBS storage.

I’d love to know what kind of hardware Amazon is using for their EBS service; most large-scale storage systems would reach their maximum capacity in a matter of days given these growth rates. Thin provisioning of EC2 and EBS volumes would help a lot, but these are still massive amounts of capacity.

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>