5 min read

Types of Databases

1 Introduction

Herewith I start a new blog series.

After covering all kinds of topics about Data Science on my R page and Python page, I now want to get into the topic of databases and SQL.

The first question is which databases are available and which one is the right one for my project.

2 SQL vs. NoSQL

Roughly, databases can be divided into two categories:

  • SQL databases
  • No SQL databases

2.1 Data Structure

The primary factor that determines the choice of SQL or No SQL database is the way the data is structured. If your data is primarily structured, a SQL database is likely the right choice.

An SQL database is excellent if each row is a specific entity and each column is an attribute describing that entity. Because of these distinct, structured relationships between rows and columns in a table, SQL databases are best when you need ACID compliance.

If the data requirements are not clear or if there is unstructured data, a NoSQL database may be the best choice. Data stored in a NoSQL database does not require a predefined schema, as would be the case with an SQL database.

Rather, the data can be stored as follows:

With NoSQL, you can:

  • Create documents without carefully defining their structure upfront
  • Add fields to the database without changing the fields of existing documents
  • Store documents that have their own unique structure
  • Have multiple databases with different structures and syntax

A NoSQL database is much better suited to store data that does not fit in a table. NoSQL databases are designed with flexibility and scalability in mind and follow the BASE consistency model.

Here is another good description regarding the trade-off between ACID and BASE.

2.2 Ability to query Data

Another factor that should be considered when selecting a database is how often the data should be queried. It also plays a role how fast these queries have to be executed and who should be responsible for these queries in the future.

If the data is well structured and organized, it is very efficient to query this data with an SQL database. SQL is a very popular and widely used programming language that can efficiently execute queries. Furthermore, it is easy to learn and can therefore be executed by a large group of people such as business analysts or managers.

A NoSQL database offers great flexibility in the types of data that should be stored, but because of the potentially large differences in data structures, querying is not as efficient as with an SQL database. When executing NoSQL queries, additional preprocessing of the data must be performed. Which preprocessing steps are necessary depends on the respective NoSQL database used. Many developers build the query functionality into the application layer instead of taking care of it in the database layer. Querying NoSQL databases usually requires developers or data scientists. This is expected to be somewhat more costly and less efficient.

2.3 Scaling

Scaling turns out differently for SQL and NoSQL databases. Therefore, if you plan ahead, this factor also plays an important role when choosing a database. The question to be asked at this point is how far the database is expected to grow in the future.

SQL databases scale vertically, so the capacity of a single server must be increased to allow the database to scale. Since SQL databases are designed to run on a single server (to ensure data integrity) they are not easy to scale.

In contrast, NoSQL databases scale horizontally, which means that more servers can be added to keep the growing database running. Since NoSQL databases require much less structure than SQL databases, each stored object is virtually self-contained and independent, so these objects can be easily stored on multiple servers without having to be linked. With SQL databases, each table row and table column must be related.

3 Database Options

Regardless of whether you choose an SQL or NoSQL database (or both), there are a variety of options to choose from. Below I’ll mention a few, but there are many more.

3.1 On-premise SQL Databases

3.2 SQL Databases on Cloud-Service Platforms

3.3 On-premise NoSQL Databases

3.4 NoSQL Databases on Cloud-Service Platforms

4 Conclusion

In this post I have listed which different database types exist and what advantages and disadvantages each database has.

Furthermore, I have reported on which preliminary considerations should be made in order to be able to decide on a suitable database.

References:

Here are my references for the creation of this article: