NoSQL is increasingly being hailed within development circles as a next-generation database that fixes all the performance, scalability, and complexity problems that many organizations encounter when using relational databases. Facebook, Google, and now Twitter have started using NoSQL, so it's easy to see how many businesses might think that NoSQL is the solution to very real and costly problems associated with relational databases. But NoSQL isn't magical, nor is it without limitations.
While NoSQL delivers powerful capabilities, it requires a number of very serious compromises that can be detrimental for overall business use. Most of the scalability, performance, and complexity problems associated with relational databases are rooted in a lack of understanding, and NoSQL's approach, essentially throwing the baby out with the bathwater, means that organizations looking to NoSQL for painless solutions will be sadly mistaken. In this article, I will identify some of the core trade-offs and limitations that businesses must address when considering NoSQL.
How NoSQL Works
The key to understanding NoSQL is to realize that it isn't a product. It's a paradigm, or an approach to storing data. Currently, over 20 different NoSQL implementations are available. When most people talk about NoSQL, they're usually discussing the more common implementations, such as Cassandra, Hadoop, CoucheDB, MemcacheDB, MongoDB, Google's BigTable, Voldemort, and others. Consequently, any discussion of how NoSQL works has to be prefaced by the caveat that implementation details and architectural considerations vary widely from one option to the next. Consequently, comparing all NoSQL implementations to relational databases as NoSQL solutions take many different approaches to data storage.
At a high level, NoSQL implementations share the goal of increased performance and scalability through jettisoning what some consider unwanted and unnecessary capabilities found in today's relational databases. The problem, of course, is that jettisoning those features comes at a high cost, at least for normal business considerations.
For example, a key tenet of most NoSQL databases is that they throw out atomicity, consistency, isolation, durability (ACID) in favor of Basically Available data with Soft state that becomes Eventually consistent (BASE). On the plus side, developers are freed from issues of managing locking and blocking. However, on the negative side, consistency and durability issues risk causing problems with end-user interactions. Many IT professionals with real-world experience almost completely dismiss the use of NoSQL because of its inability to manage complex business needs, workflows, or interactions (more on this shortly).
Likewise, NoSQL gets rid of schema and works directly with data through APIs that developers typically find easier to use because they don't have to use SQL for Create, Read, Update, Delete (CRUD) operations. NoSQL also gets rid of SQL JOINs, which help developers eliminate impedance mismatch. These benefits mean that developers can typically create solutions in less time as there is less complexity to manage. The downside, though, is that applications typically lose complex filtering options and aggregates along with ad hoc and analytical reporting capabilities. This means that NoSQL solutions really don't fit the bill for applications where businesses need to regularly analyze data.
Finally, a key to NoSQL's performance and scalability is that most NoSQL implementations store data exclusively in RAM. By sharing RAM across multiple servers, NoSQL picks up the ability for easy scale-out operations while also benefiting from increased redundancy or higher-availability through fault tolerance. NoSQL's scale-out strategy also picks up additional cost benefits because most implementations are open source. Licensing considerations are significantly less than those involving relational databases attempting scale-out architectures. The problem is that when businesses focus only on the benefits of scalability and performance, they can miss just how expensive those benefits really are. Let's start by addressing performance and scalability, and then we'll look at the other considerations.
Performance and Scalability: RDBMs vs. NoSQL
I've spent over a decade largely focused on performance-related issues for relational databases and working with a wide variety of platforms, including MySQL, Oracle, and SQL Server. I know how critical performance is to business, and I'm not oblivious to the performance problems that relational databases encounter. However, except in an extremely narrow set of highly specialized circumstances, NoSQL is not the answer to those problems. Rather, NoSQL's performance and scalability strengths come at too high a cost for NoSQL to be considered for general business use.