WORK[etc] Web App Blog

Welcome to the SSD club, Amazon EC2 (and how SSD's powered a 714% Performance Increase)

22
Jul
Posted by Daniel Barnett Sun, 22 Jul '12 @ 7:07 PM
Back in February we announced our massive infrastructure upgrade and how we were taking a risk moving all our database storage onto a SSD (Solid State Drives) storage platform.  Now Amazon has just announced SSD-powered instances of its cloud-computing platform, EC2. 
 
For those that don't know SSD (Solid State Drives) are flash based memory that can massively improve the performance of computers.  Unlike normal hard-drives, an SSD has no spinning platter and no moving parts. 
 
If you've bought a new laptop lately, with a SSD drive, then you've already experienced the profound impact they have; boot times are reduced and applications seem to launch instantly.
 
In terms of actual numbers, an enterprise level SSD drive can shuffle data around at speeds of ~500MB/s, whereas your traditional hard drive is going to max out at 140MB/s.  The critical figure for database heavy applications like WORK[etc] is the random access time.  This is the time it takes for the hard drive to retrieve and read data.  For a SSD that time is 0.1 ms, for a traditional drive we're talking around 8 - 12ms - resulting in a 100x improvement.
 
But all this cost comes at a price.  SSD drives can be ~10 x the cost of normal drives and if you need ultra-reliability and performance for data-centre use, quadruple that figure again. 
 
When we started buying Toshiba enterprise drives late last year, they were ~$2000 per drive.  We now have 16 of the suckers running.  But well worth it as the results have been incredible.
 
This is how the WORK[etc] infrastructure performed pre-SSD:
Avg. Disk Queue Length: 1.5
Avg. Disc sec/Read: 0.008
Avg. Disc sec/Write: 0.005
CPU: 29.5%

What we're seeing is the average disk queue length of 1.5 (product of Avg. Disk sec/Transfer multiplied by Disk Transfers/sec - so bigger the number, the longer the queue for your database request) and the time to read data sitting in at .008 seconds. 
 
This means that a packet of data (ie a request to display something from your WORK[etc] database) is first sitting in a queue and then taking an amount of time to be read back.
 
Post SSD the numbers speak for themselves:
Avg. Disk Queue Length: 0.21
Avg. Disc sec/Read: 0.001
Avg. Disc sec/Write: 0.001
CPU: 6.8%

The disk queue length is way under a factor of "1" for an improvement of 714%.  And amazingly the average disc sec/read is 0.001 - which may as well be zero seconds.  You'll notice the CPU utilization is running significantly lower also.
 
Server-Admins/Network Engineers Note: Obviously this comparison is fluffy as we haven't weighted results for any number of variables.  The goal here is only to provide a basic overview of the technology involved and performance gains.
 

Improving Network Speeds

 
The next big performance upgrade we're right in the middle of rolling out is a migration to the Akamai content optimized network (Akamai Web Application Accelerator).  What this means is that instead of your WORK[etc] data traveling through the public Internet, it will instead find the closest of Akamai's 22,000 international data points and then travel across a private and optimized network.
 
Running over the Akamai network is kinda like pulling out of your driveway in the morning, reaching the end of your street and then turning onto your own teflon-coated private lane all the way to the office.  The longer the distance you have to go, the greater the speed improvement.  And if there is bad weather ahead a new lane magically appears to take you another route.
 
We're almost done testing internally and will be rolling out later this month.  I'll publish some cool charts here showing the speed performance from different origination points around the globe.
 

New UX Update:

 
The first batch of beta upgrades to the new UX has just started.  We're running 1 batch on Monday, and if all goes well, another batch on Wednesday and then the final on Friday.
 
About the biggest issue we're seeing right now is that it the interface is a little sluggish on FireFox because of FF's javascript engine.  On the latest Internet Explorer and Chrome it is super snappy. 
 
I've experienced this first hand, having been a FF fan since version 2.0.  But the inconvenience of switching across to IE is worth it for the better experience across not just the new WORK[etc], but also other sites as well.