NOSQL
Not all data is relational. For those situations, NoSQL can be helpful.
With that said, NoSQL stands for "Not Only SQL". It's not intended to knock SQL or supplant it.
SQL has several very big advantages:
It's a mistake to think about this as an either/or argument. NoSQL is an alternative that people need to consider when it fits, that's all.
Documents can be stored in non-relational databases, like CouchDB.
The reason why NoSQL has been so popular the last few years is mainly because, when a relational database grows out of one server, it is no longer that easy to use. In other words, they don't scale out very well in a distributed system. All of the big sites that you mentioned Google, Yahoo, Facebook and Amazon (I don't know much about Digg) have lots of data and store the data in distributed systems for several reasons. It could be that the data doesn't fit on one server, or there are requirements for high availability.
NoSQL is a very broad term and typically is referred to as meaning "Not Only SQL." The term is dropping out of favor in the non-RDBMS community.
You'll find that NoSQL database have few common characteristics. They can be roughly divided into a few categories:
Schema
SQL database has its own predefined schema to store structured data.
In NoSQL database, there is no predefined schema, here schema is most dynamic element based on the data elements.
Scalability
SQL Databases are vertically scalable, which means if we want to scale SQL base database, we need to give hardware boost on which the DBMS System is installed. This is where it sometimes goes for the limitation of scalability.
NoSQL databases are horizontally scalable, means if we want to scale it, we need to add more nodes and create distribution network based on our own need and required power. This is how they reduce load on the database
Data Retrieval
In SQL based databases, to define and manipulate data we can use SQL (Structured Query Language), which is very powerful nowadays.
In terms of NoSQL database, queries focus on collection and documents. Sometimes it is called UnQL (Unstructured Query Language). This is still in the evolution phase, so it varies from vendor to vendor of the NoSQL database
Document-Based Databases
Rather than the tables underlying a relational database, document-based databases organize data into “documents,” which exist in the grey area between a web page and a traditional table found in a relational database. A document can take many forms — a business card, an encyclopedia entry, a web page, an annual report, an entire book — but can be any sort of data.
Advantages: Allows for different kinds of data to be stored easily — you don’t have to make every document the same. Most document-based databases allow for very quick searching of text. The design of the database does not need to be set when you deploy it, and new types of information can easily be added. You don’t need to assign meaning to all the data you enter.
Disadvantages: You don’t need to assign meaning to all the data you enter (yes, that’s both an advantage and a
disadvantage). Often slower than a relational database and often requires more storage space. Errors in how data is described can be easily introduced. Similar data is not necessarily treated as such. Not as many protections against duplication of data.
Popular products: MongoDB, CouchDB, MarkLogic
Graph Database
“Graph” does not mean “chart.” A graph is a mathematical system that can be described in terms of chunks of information (called “nodes”) and the relationships between these chunks of information (“edges”). Think of a social network: Individuals (nodes) are linked together by friendships (edges). Or a highway system: Towns (nodes) are linked together by roads (edges).
Different kinds of nodes and edges can be used in the same database to add many layers of meaning. Think of a corporate structure: Employees are nodes, the edges between two people are the relationship — teammate of, supervisor of, subordinate of — and employees can have many different relationships with their fellow employees. Projects can also be nodes, and projects can have edges with people — team member of, project leadof — and edges with other projects — dependent on, replaced by. Many kinds of data are well represented by graphs, but it requires a very different way of thinking about information
Advantages: Makes it easier to express many kinds of data that require significant kludging to fit in a relational database. Certain kinds of searches that are very difficult in a relational database (i.e., any search where relationships between different kinds of data are important) are very quick and easy. Easily allows for new kinds of data. Very well suited to the irregular, complex data involved in mapping the “real world.”
Disadvantages: Operations on large amounts of data can be very slow. Can use a lot of space. Not widely used in business environments (yet). Very easy to describe data inconsistently, which can quickly reduce the usefulness of the database. Generally requires all data to exist explicitly in relation to other data. Can be conceptually difficult to understand at first.
Popular products: Neo4j, Titan, FlockDB
Not all data is relational. For those situations, NoSQL can be helpful.
With that said, NoSQL stands for "Not Only SQL". It's not intended to knock SQL or supplant it.
SQL has several very big advantages:
- Strong mathematical basis.
- Declarative syntax.
- A well-known language in Structured Query Language (SQL).
- Those haven't gone away.
It's a mistake to think about this as an either/or argument. NoSQL is an alternative that people need to consider when it fits, that's all.
Documents can be stored in non-relational databases, like CouchDB.
The reason why NoSQL has been so popular the last few years is mainly because, when a relational database grows out of one server, it is no longer that easy to use. In other words, they don't scale out very well in a distributed system. All of the big sites that you mentioned Google, Yahoo, Facebook and Amazon (I don't know much about Digg) have lots of data and store the data in distributed systems for several reasons. It could be that the data doesn't fit on one server, or there are requirements for high availability.
NoSQL is a very broad term and typically is referred to as meaning "Not Only SQL." The term is dropping out of favor in the non-RDBMS community.
You'll find that NoSQL database have few common characteristics. They can be roughly divided into a few categories:
- key/value stores
- document databases
- graph database
Schema
SQL database has its own predefined schema to store structured data.
In NoSQL database, there is no predefined schema, here schema is most dynamic element based on the data elements.
Scalability
SQL Databases are vertically scalable, which means if we want to scale SQL base database, we need to give hardware boost on which the DBMS System is installed. This is where it sometimes goes for the limitation of scalability.
NoSQL databases are horizontally scalable, means if we want to scale it, we need to add more nodes and create distribution network based on our own need and required power. This is how they reduce load on the database
Data Retrieval
In SQL based databases, to define and manipulate data we can use SQL (Structured Query Language), which is very powerful nowadays.
In terms of NoSQL database, queries focus on collection and documents. Sometimes it is called UnQL (Unstructured Query Language). This is still in the evolution phase, so it varies from vendor to vendor of the NoSQL database
Document-Based Databases
Rather than the tables underlying a relational database, document-based databases organize data into “documents,” which exist in the grey area between a web page and a traditional table found in a relational database. A document can take many forms — a business card, an encyclopedia entry, a web page, an annual report, an entire book — but can be any sort of data.
Advantages: Allows for different kinds of data to be stored easily — you don’t have to make every document the same. Most document-based databases allow for very quick searching of text. The design of the database does not need to be set when you deploy it, and new types of information can easily be added. You don’t need to assign meaning to all the data you enter.
Disadvantages: You don’t need to assign meaning to all the data you enter (yes, that’s both an advantage and a
disadvantage). Often slower than a relational database and often requires more storage space. Errors in how data is described can be easily introduced. Similar data is not necessarily treated as such. Not as many protections against duplication of data.
Popular products: MongoDB, CouchDB, MarkLogic
Graph Database
“Graph” does not mean “chart.” A graph is a mathematical system that can be described in terms of chunks of information (called “nodes”) and the relationships between these chunks of information (“edges”). Think of a social network: Individuals (nodes) are linked together by friendships (edges). Or a highway system: Towns (nodes) are linked together by roads (edges).
Different kinds of nodes and edges can be used in the same database to add many layers of meaning. Think of a corporate structure: Employees are nodes, the edges between two people are the relationship — teammate of, supervisor of, subordinate of — and employees can have many different relationships with their fellow employees. Projects can also be nodes, and projects can have edges with people — team member of, project leadof — and edges with other projects — dependent on, replaced by. Many kinds of data are well represented by graphs, but it requires a very different way of thinking about information
Advantages: Makes it easier to express many kinds of data that require significant kludging to fit in a relational database. Certain kinds of searches that are very difficult in a relational database (i.e., any search where relationships between different kinds of data are important) are very quick and easy. Easily allows for new kinds of data. Very well suited to the irregular, complex data involved in mapping the “real world.”
Disadvantages: Operations on large amounts of data can be very slow. Can use a lot of space. Not widely used in business environments (yet). Very easy to describe data inconsistently, which can quickly reduce the usefulness of the database. Generally requires all data to exist explicitly in relation to other data. Can be conceptually difficult to understand at first.
Popular products: Neo4j, Titan, FlockDB