First you need to define the type of data you are dealing with. Do you have multiple long and complex transactions? Do you need scalability? Is a traditional RDBMS product the perfect solution? What do you know about NoSQL databases?
RDBMS : The solution to all storage problems ?
We are used to work with RDBMS, its fundamental features are:
- Table based
- Relations between distinct table entities and rows
- Referential integrity
- ACID transactions
- Arbitrary queries and joins
For example you can find among the most famous RDBMS: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, InterBase, DB2.
But RDBMS logic may reduces the performance of your system. If you need a high-responsive and available system, RDBMS might not be your best choice.
RDBMS use a table-based normalization approach to data, and that’s a limited model. Certain data structures cannot be represented without tampering with the data, programs, or both.
They allow versioning or operations like CRUD (Create, Read, Update and Delete). For some databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and not delete the previous value for that record.
Performance falls off as RDBMS normalize data. Because normalization requires more tables, table joins, keys and indexes and thus more internal database operations for implement queries. Pretty soon, the database starts to grow into the terabytes, and that’s when things slow down.
Scale Up, Scale Out
If you are looking for high performance, you should ask yourself if you prefer to scale up or scale out. The difference is very simple, when you do vertical scaling you add more RAM, more disk storage, more CPU, but as years move and your business grows, if your system cannot manage the growing load, you may need to buy a bigger hardware and dump your current system.
This operation can be expensive.
When you do horizontal scaling, you add multiple small systems. Load is shared across systems. With this option you can certainly add one or more systems more easily. It will help you to grow faster with less cost. But there is a difficulty with this option, your application should have logic to share the load and retrieve the correct information from corresponding server and storage location.
Origins of the NoSQL approach
NoSQL means “Not only SQL”, it differs from the classical model of RDBMS in some significant ways:
- They do not use SQL as primary query language
- They do not require fixed table schemas
- They usually do not support join operations
- They may not give full ACID guarantees
- They typically scale horizontally.
Today it is used by major internet companies, such as Google, Amazon, Twitter and Facebook which had problems dealing with data that the traditional RDBMS solutions could not cope with.
NoSQL existing solutions
There are six major categories of NoSQL:
- Key-values Stores: it offers functions for storing and retrieving values associated with unique keys. It is very effective for content caching and logging. The model is the most simple. Examples: Voldemort, Redis, Dynomite, Oracle BDB
- Column Family Stores: these were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family. Examples: Hadoop, Cassandra, Hypertable
- Document database: the model is basically versioned documents that are collections of other key-value collections. The documents are stored in formats like JSON. Document databases support querying more efficiently. Examples: MongoDb, CouchDB
- Graph Databases: it uses a flexible graph model and it can scale across multiple machines. Very effective with social networking and recommendations. Examples: Neo4J, InfoGrid, Infinite Graph
- Object Databases like Versant, Objectivity
- XML Databases like eXist, BaseX, EMC, Sedna
Generally, the best places to use NoSQL technology is where the data model is simple. Where flexibility is more important than strict control over defined data structures. Where high performance is a must, where strict data consistency is not required and where it is easy to map complex values to known keys. However if you are in the case of a merchant website, such as a tourism website, you will need to perform multiple long and complex transactions that only RDBMS can provide. For example if you want to travel, it will combine the purchase of your flight, your hotel, a possible rental car, and many other conditions.
For example, the database of Amadeus (Booking of airline tickets for European companies) runs under MS SQL Server and TGV.com under Oracle.
How to make your decision:
- You have complex data structures and queries
- You need ACID transactions
- You already have DBA and experts used to work with RDBMS
- Your applications are initially created and optimized for RDBMS
- You want to work with a data warehouse or business intelligence
- You are looking for flexibility, NoSQL doesn’t require fixed table schemas
- You want simplicity, you don’t need a relational database
- You want storage that is horizontally scalable
- You have to deal with very high load and large data needs