Choosing the right database for your business

databaseFirst you need to define the type of data you are dealing with. Do you have multiple long and complex transactions? Do you need scalability? Is a traditional RDBMS product the perfect solution? What do you know about NoSQL databases?

RDBMS : The solution to all storage problems ?

We are used to work with RDBMS, its fundamental features are:

  • Table based
  • Relations between distinct table entities and rows
  • Referential integrity
  • ACID transactions
  • Arbitrary queries and joins

For example you can find among the most famous RDBMS: MySQL, PostgreSQL, Oracle, Microsoft SQL Server, InterBase, DB2.

But RDBMS logic may reduces the performance of your system. If you need a high-responsive and available system, RDBMS might not be your best choice.

RDBMS use a table-based normalization approach to data, and that’s a limited model. Certain data structures cannot be represented without tampering with the data, programs, or both.

They allow versioning or operations like CRUD (Create, Read, Update and Delete). For some databases, updates should never be allowed, because they destroy information. Rather, when data changes, the database should just add another record and not delete the previous value for that record.

Performance falls off as RDBMS normalize data. Because normalization requires more tables, table joins, keys and indexes and thus more internal database operations for implement queries. Pretty soon, the database starts to grow into the terabytes, and that’s when things slow down.

Scale Up, Scale Out

If you are looking for high performance, you should ask yourself if you prefer to scale up or scale out. The difference is very simple, when you do vertical scaling you add more RAM, more disk storage, more CPU, but as years move and your business grows, if your system cannot manage the growing load, you may need to buy a bigger hardware and dump your current system.

This operation can be expensive.

When you do horizontal scaling, you add multiple small systems. Load is shared across systems. With this option you can certainly add one or more systems more easily. It will help you to grow faster with less cost. But there is a difficulty with this option, your application should have logic to share the load and retrieve the correct information from corresponding server and storage location.

Origins of the NoSQL approach

NoSQL means “Not only SQL”, it differs from the classical model of RDBMS in some significant ways:

  • They do not use SQL as primary query language
  • They do not require fixed table schemas
  • They usually do not support join operations
  • They may not give full ACID guarantees
  • They typically scale horizontally.

Today it is used by major internet companies, such as Google, Amazon, Twitter and Facebook which had problems dealing with data that the traditional RDBMS solutions could not cope with.

NoSQL existing solutions

There are six major categories of NoSQL:

  • Key-values Stores: it offers functions for storing and retrieving values associated with unique keys. It is very effective for content caching and logging. The model is the most simple. Examples: Voldemort, Redis, Dynomite, Oracle BDB
  • Column Family Stores: these were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family. Examples: Hadoop, Cassandra, Hypertable
  • Document database: the model is basically versioned documents that are collections of other key-value collections. The documents are stored in formats like JSON. Document databases support querying more efficiently. Examples: MongoDb, CouchDB
  • Graph Databases: it uses a flexible graph model and it can scale across multiple machines. Very effective with social networking and recommendations. Examples: Neo4J, InfoGrid, Infinite Graph
  • Object Databases like Versant, Objectivity
  • XML Databases like eXist, BaseX, EMC, Sedna

In Conclusion

Generally, the best places to use NoSQL technology is where the data model is simple. Where flexibility is more important than strict control over defined data structures. Where high performance is a must, where strict data consistency is not required and where it is easy to map complex values to known keys. However if you are in the case of a merchant website, such as a tourism website, you will need to perform multiple long and complex transactions that only RDBMS can provide. For example if you want to travel, it will combine the purchase of your flight, your hotel, a possible rental car, and many other conditions.

For example, the database of Amadeus (Booking of airline tickets for European companies) runs under MS SQL Server and TGV.com under Oracle.

How to make your decision:

  • RDBMS:
    • You have complex data structures and queries
    • You need ACID transactions
    • You already have DBA and experts used to work with RDBMS
    • Your applications are initially created and optimized for RDBMS
    • You want to work with a data warehouse or business intelligence
  • NoSQL
    • You are looking for flexibility, NoSQL doesn’t require fixed table schemas
    • You want simplicity, you don’t need a relational database
    • You want storage that is horizontally scalable
    • You have to deal with very high load and large data needs

Setting up an Nginx Reverse Proxy on Debian

nginx_logo

What is Nginx?

Nginx (pronounced “Engine-X”) is an open source Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage. In this example we are going to use Nginx as a Reverse Proxy.

What are the benefits of a reverse proxy like Nginx?

  • Distribute the load to several servers
  • Reduce load with caching, or by compressing the content
  • It can hide the existence and characteristics of the origin server(s)
  • Protection against common web-based attacks
  • A/B testing
  • Single public IP address to access to multiple web servers

How to install Nginx?

First edit the file /etc/apt/sources.list and add the following lines:

deb http://nginx.org/packages/debian/ squeeze nginx
deb-src http://nginx.org/packages/debian/ squeeze nginx

Now you can install it:

apt-get update
apt-get install nginx

Edit you nginx config file /etc/nginx/nginx.conf:

user www-data;
worker_processes 6;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';

    access_log  /var/log/nginx/access.log  main;

    sendfile        on;
    keepalive_timeout  10;

    #Compression Settings
    gzip on;
    gzip_http_version 1.0;
    gzip_comp_level 2;
    gzip_proxied any;
    gzip_min_length  1100;
    gzip_buffers 16 8k;
    gzip_types text/plain text/css application/x-javascript text/xml application/xml application/xml+rss text/javascript;
    gzip_vary on;

    include /etc/nginx/conf.d/*.conf;
}

A worker process is a single-threaded process.

If Nginx is doing CPU-intensive work such as SSL or gzipping and you have 2 or more CPUs/cores, then you may set worker_processes to be equal to the number of CPUs or cores.

If you are serving a lot of static files and the total size of the files is bigger than the available memory, then you may increase worker_processes to fully utilize disk bandwidth.

The worker_connections and worker_processes from the main section allows you to calculate max clients you can handle:

max clients = worker_processes * worker_connections

Then you must edit your /etc/nginx/conf.d/proxy.conf, in this file we define our server.

server {

listen 80;

    access_log off;
    error_log off;

location / {
    proxy_pass http://127.0.0.1:8080;
    proxy_redirect off;
    proxy_redirect off;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_max_temp_file_size 0;
    proxy_connect_timeout 90;
    proxy_send_timeout 90;
    proxy_read_timeout 90;
    proxy_buffer_size 4k;
    proxy_buffers 4 32k;
    proxy_busy_buffers_size 64k;
    proxy_temp_file_write_size 64k;
 }
# This block will catch static file requests, such as images, css, js
# The ?: prefix is a 'non-capturing' mark, meaning we do not require
# the pattern to be captured into $1 which should help improve performance
location ~* \.(?:ico|css|js|gif|jpe?g|png)$ {
    # Some basic cache-control for static files to be sent to the browser
    expires max;
    add_header Pragma public;
    add_header Cache-Control "public, must-revalidate, proxy-revalidate";
  }

# this prevents hidden files (beginning with a period) from being served
location ~ /\.          { access_log off; log_not_found off; deny all; }

}

As you can see Nginx will listen on port 80.

The directive proxy_pass sets the address of the proxied server and the URI to which location will be mapped. Here it’s our local Apache server and it must be listening on port 8080. Edit your /etc/apache2/ports.conf and other vhosts to listen on the right port.

NameVirtualHost *:8080
Listen 8080

Now check your configuration with:

service nginx configtest
service apache configtest

Finally restart Apache and start Nginx:

service apache2 restart
service nginx start