SQL Azure and the Promise of Cloud Availability
In sizing up SQL Azure as an option for my solution, one thing I did have to do was gauge how much downtime I could potentially expect when using SQL
Azure. As much as the hype from all cloud vendors would have you believe that cloud solutions are always on, that's commonly not the case. As such, I
went ahead and did some homework on overall uptime for SQL Azure within the last year to get a feel for what I could potentially expect going forward.
Of course, my homework was nothing more than taking a peek at the excellent uptime statistics compiled by www.cloudharmony.com. Its Cloud Status report is something everyone should bookmark who's looking to use a cloud service of any type
or flavor. According to the report's metrics, SQL Azure had an uptime of 99.985 during the past 365 days prior to my inquiry. According to the rule of five-nines, an uptime of 99.985 means that within the last year, SQL Azure had just under
53 minutes (what you'd get at 99.99 percent) of total downtime.
More specifically, in looking at the detailed statistics on cloudharmony.com, it turns out that SQL Azure was down a little under 33 minutes for an
entire year. Not bad, but not 100 percent uptime either.
Because the highly distributed application I'm building uses SQL Server persistence only during the startup of each node and to periodically write tiny
amounts of data that can be eventually consistent, I decided that even if SQL Azure was to double its failure rates in the next year, having a SQL
Azure database along with a mirror SQL Server database hosted somewhere else would be good enough for me to work with. As such, what I've done is
settled on using SQL Azure as my primary data storage mechanism because pricing and uptime are great. Then I've gone ahead and actually set up a
secondary mirrored database for redundancy purposes.
This approach obviously wouldn't work for many applications that are database-centric, but this
solution will probably only ever get to be around 2GB in size, only read about 2MB of data when an individual node starts up, and only make periodic
writes. Consequently, instead of using a data repository, I've actually gone ahead and created a redundant data repository that pushes all writes to
both databases (that uses an Amazon Simple Queue Service persistence mechanism to queue writes against either database if it's down) and tries to read
from the failover or secondary database when the primary database doesn't respond quickly enough during node startup.
So far this approach is working well. And although creating a redundant repository did add some additional complexity to my application, the amount of
complexity paled in comparison to what I would have had to do had I not written my application against a semi-permanent centralized data store. This
meant that I could avoid coding up true peer-to-peer semantics that would have made my simple solution a nightmare.
As such, I think I might actually have a shot at hitting that coveted 100 percent uptime—
even if 100 percent uptime is theoretically impossible and crazy to pursue.