Sharding

Q: What are the different Sharding strategies?

1. The Lookup strategy2. The Range strategy3. The Hash strategy

What is Sharding?

Sharding is a technique of splitting or partitioning a single and large logical dataset into multiple/smaller databases and storing them for easy management of data. The word shard means "a small part of a whole" and these smaller parts of a large database are called data shards. Distributing data among multiple machines helps enterprises to create clusters of database systems that can be easily handled and accessed faster.

Sharding is essential to incase if the dataset is too large to be stored in a single database or machine. Hence, many sharding strategies allow additional machines to be added to the system to distribute the load. It enables a database cluster to scale along with its data and traffic growth and scale-up operations.

Sharding is a database partitioning technique highly used by blockchain companies to achieve scalability, and efficiency and enables them to process more transactions per second. An individual shard is comprised of its own data, making it distinctive and independent when compared to other shards in the entire system.

Sharding can help reduce the processing speed of a network since it splits a blockchain network into separate small shards. A single shard is managed by an individual server. Based on the replication schema, the shard could be replicated twice on other shards, which might cause trouble for the system.

‍

What is Horizontal and Vertical Sharding?

When each new table or dataset has the same schema but unique rows, it's called horizontal sharding. on the other hand, when each new table has a schema that is an authentic subset of the original table's schema, it is known as vertical sharding. The difference between horizontal vs vertical comes from the traditional tabular view of a database.

In horizontal sharding, more machines are added to an existing stack to spread out the load, partition the data, increase processing speed, and support more traffic. This method is most effective when queries return a subset of rows that are often grouped based on certain common characteristics. And vertical sharding is effective when queries usually return only a subset of columns of the data.

A database can be split vertically and used for storing different tables & columns in a separate database, or horizontally used to store rows of the same table in multiple database nodes. nodes. Vertical partitioning is very domain-specific, where you can draw a logical split within your application data, storing them in different databases inside the system.

‍

Original Dataset

Patient ID	Name	Age	Department	Doctor
1221	Keith	23	Cardiology	Dr Moses
1222	Simon	37	General	Dr Solomon
1223	Jhon	43	Orthopaedic	Dr Jacob
1224	Rebecca	26	Gynecology	Dr Issac
1225	Catherine	51	Oncology	Dr Noah

‍

Horizontal shards

Shard 1

Patient ID	Name	Age	Department	Doctor
1221	Keith	23	Cardiology	Dr Moses
1222	Simon	37	General	Dr Solomon

Shard 2

Patient ID	Name	Age	Department	Doctor
1223	Jhon	43	Orthopaedic	Dr Jacob
1224	Rebecca	26	Gynecology	Dr Issac
1225	Catherine	51	Oncology	Dr Noah

‍

Vertical Shards

Shard 1

Patient ID	Name	Age
1221	Keith	23
1222	Simon	37
1223	Jhon	43

Shard 2

Patient ID	Department
1221	Cardiology
1222	General
1223	Orthopaedic

Shard 3

Patient ID	Doctor
1221	Dr Moses
1222	Dr Solomon
1223	Dr Jacob

‍

What are the different Sharding strategies?

The Lookup strategy

In the lookup strategy, the sharding logic implements a map that connects a request for data to the shard that retains that particular data using a unique shard key. With the sharding technique, the system designers assign shard keys to the physical storage that can be mapped with physical shards where each shard key attributes to a physical partition. There's another way to shard under the lookup strategy is to distribute the shards virtually. The system designers can assign unique keys to individual shards in the database and reduce the number of physical shardings in the database. It follows an in-line method where, an application locates data using a shard key that refers to a virtual shard, and the system transparently maps virtual shards to physical sections.

The Range strategy

Under the range strategy, the related items are compiled into a single shard and classified by shard key in a sequential manner. This strategy is useful for applications and services that frequently retrieve sets of items using range queries where queries have been assigned to shard keys and data has been retrieved from the database from the sequence of shards. For example, a hospital using an application regularly needs to find the list on a monthly basis. It's advisable to use the range strategy and save the patients of a month list in date and time order in the same shard. If each order was stored in a different shard, they'd have to be fetched individually by performing a large number of point queries that could be time consuming and lengthy. But, if the system designer uses a range strategy, they can help hospitals find monthly data with the common composite shard key.

The Hash strategy

The hash strategy is similar to hashing techniques used in neural network training. Where the load on the server is equally divided with the help of virtual nodes. In database management, the benefit of this strategy is to reduce the chance of hotspots, which means shards should not receive data more than their carrying capacity. Here, the system distributes the data across the shards in a way that achieves a balance between the size of each shard and the average load that each shard will encounter. By introducing some random element into the computation, we can perform an equal distribution between the shards.