Categories: Software Architecture

What is Sharding in Database Partitioning?

In this post, we will see what is Sharding in a database.

Sharding in Database

Sharding is a database partitioning technique that involves breaking up a large database into smaller, more manageable parts called shards.

Each shard contains a subset of the data, and together, they make up the complete dataset.

The idea behind sharding is to distribute the data across multiple machines or servers, to improve scalability, performance, and availability.

By partitioning the data and distributing it across multiple machines, queries and transactions can be processed in parallel, allowing for faster processing times.

Sharding can be done in several ways, such as range-based sharding, hash-based sharding, and list-based sharding.

Range Based Sharding

In range-based sharding, data is partitioned based on a specific range of values, such as a date range or a geographic region.

Hash Based Sharding

In hash-based sharding, data is distributed across shards based on a hash function, which maps data to a specific shard.

List Based Sharding

In list-based sharding, data is partitioned based on a specific list of values, such as a list of customer IDs.

Challenges in Sharding

However, sharding also comes with some challenges, including increased complexity in managing the shards, the potential for data consistency issues, and increased latency due to network communication between the shards.

Examples of Sharding

Here below some examples of sharding in action,

Social Media Platform: A social media platform like Twitter or Instagram may use sharding to partition user data based on geographical location. For example, all users in Europe might be stored on one set of shards, while users in North America are stored on another set of shards. This allows the platform to handle large amounts of data while ensuring that queries for specific regions can be processed quickly.

Online Retailer: An online retailer like Amazon may use sharding to partition customer data based on their purchase history or order volume. For example, high-volume customers might be stored on one set of shards, while low-volume customers are stored on another set of shards. This allows the retailer to personalize recommendations and promotions for individual customers while ensuring that queries can be processed quickly.

Gaming Platform: A gaming platform like Steam or Xbox Live may use sharding to partition game data based on player activity. For example, players who are currently playing a certain game might be stored on one set of shards, while players who are not actively playing the game are stored on another set of shards. This allows the platform to handle large numbers of concurrent players while ensuring that queries for specific games can be processed quickly.

These are just a few examples, but sharding can be used in many different applications where large amounts of data need to be processed quickly and efficiently.

Here is a simple example of sharding a database using MongoDB, a popular NoSQL database,

First, let’s assume we have a database of user information that’s getting too big to handle on a single server. So we decide to shard the database across multiple servers based on the user’s location.

1. Set up a MongoDB cluster with multiple servers:

mongod --shardsvr --port 27018 --dbpath /data/db/shard1
mongod --shardsvr --port 27019 --dbpath /data/db/shard2
mongod --configsvr --port 27017 --dbpath /data/configdb
mongos --configdb localhost:27017

2. Create a sharded collection for the user data:

sh.enableSharding("test")
db.createCollection("users")
sh.shardCollection("test.users", {"location": 1})

3. Insert data into the collection:

db.users.insert({
  "name": "John Doe",
  "email": "john.doe@example.com",
  "location": "USA"
})

4. Query the data:

db.users.find({"location": "USA"})

When we run the query, MongoDB will automatically route the request to the appropriate shard based on the value of the “location” field. If the value is “USA”, the request will be routed to the shard that contains data for users in the USA.

The above example will give you a basic idea of how sharding works in MongoDB. In practical applications sharding implementation can be much more complex, involving multiple shards and more advanced configuration options.

Rajeev

Recent Posts

OWIN Authentication in .NET Core

OWIN (Open Web Interface for .NET) is an interface between web servers and web applications…

2 years ago

Serializing and Deserializing JSON using Jsonconvertor in C#

JSON (JavaScript Object Notation) is a commonly used data exchange format that facilitates data exchange…

2 years ago

What is CAP Theorem? | What is Brewer’s Theorem?

The CAP theorem is also known as Brewer's theorem. What is CAP Theorem? CAP theorem…

2 years ago

SOLID -Basic Software Design Principles

Some of the Key factors that need to consider while architecting or designing a software…

2 years ago

What is Interface Segregation Principle (ISP) in SOLID Design Principles?

The Interface Segregation Principle (ISP) is one of the SOLID principles of object-oriented design. The…

2 years ago

What is Single Responsibility Principle (SRP) in SOLID Design Priciples?

The Single Responsibility Principle (SRP), also known as the Singularity Principle, is a software design…

2 years ago