If you have chosen data to be stored in MySQL and if your data size is snowballing or if you are looking for faster access based on your predicates, then partitioning could be one solution that you want to understand better. In Engati, we deal with millions of users and conversations between users across 14 different channels. Querying this data for a particular customer’s user and expecting result in a few milliseconds is a business need for our flows. That’s where we adopt partitioning of data.
Imagine walking into a library where there are only 100 books. Here you won’t need to spend too much time organizing the books. Although it will still take you some time to find books there, you will still manage. Over time, you will remember where you kept a particular book. But imagine that you were in an enormous library, you surely don’t want to spend hours looking for one book. Here racks are arranged alphabetically so that you can find the book that you need easily.
Very similarly, datastores build caches to reply to queries when the dataset is small. But when the data in your datastore is in the order of hundreds of thousands or millions or billions of records, then datastores provide an option for you to create physical and/or logical segregations to find the data easily. MySQL only supports logical segregation of data called the partitions.
Just like books in a Library or Book Store can be arranged alphabetically by the name of the book or the name of the author or the year they were published in, MySQL supports various types of partitioning strategies that can be used based on the business objective.
Types of MySQL partitions
There are many types of partitions that MySQL supports. Some of the most commonly understandable ones are:
As the name suggests, here, the data in the MySQL table is partitioned based on a hashing function of the column of your interest. This ensures more uniform and equal-sized partitions, and all you need to do is specify the number of partitions you need (which people generally people to be a power of 2). For example, partitioning by a name column in a User table
CREATE TABLE user (
id INT NOT NULL,
name VARCHAR(255) NOT NULL,
age INT NOT NULL
)
PARTITION BY HASH (name) PARTITIONS 16;
Again the name suggests here the data in the MySQL table is partitioned based on a (non-overlapping) range of data stored in a column. For example, if you want to partition based on the value of a column age in a User table.
CREATE TABLE user (
id INT NOT NULL,
name VARCHAR(255) NOT NULL,
age INT NOT NULL
)
PARTITION BY RANGE (age) (
PARTITION p0 VALUES LESS THAN (20),
PARTITION p1 VALUES LESS THAN (40),
PARTITION p2 VALUES LESS THAN (60),
PARTITION p3 VALUES LESS THAN (80),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
In this case, the table definition needs to provide a list of values for a column that would go in any partition. This partition type can be chosen when there is a list of different values of data going into a column like ENUM. A point to note here is that a list partition fails the insert operation if the value of the column is not defined in any of the partition definition (unlike range partition). For instance, if you want to partition based on continent column in a User table.
CREATE TABLE user (
id INT NOT NULL,
name VARCHAR(255) NOT NULL,
age INT NOT NULL,
continent VARCHAR(20) NOT NULL
)
PARTITION BY LIST COLUMNS (continent) (
PARTITION p0 VALUES IN (“ASIA”),
PARTITION p1 VALUES IN (“AFRICA”),
PARTITION p2 VALUES IN (“EUROPE”, “ANTARCTICA”),
PARTITION p3 VALUES IN (“NORTH AMERICA”, “SOUTH AMERICA”, “OCEANIA”)
);
As you can see, MySQL supports partitioning data of a column of data type integer or varchar with list partitions. The same is supported in range partitions too, by mapping a non-integer column value to an integer value. Hash partition has another variant called Linear Hash Partition, which uses a power of 2 algorithms to decide which partition the data gets inserted into v/s the regular modulus of a hash function value.
Partitioning data can come in handy if your table’s size grows in millions and the data lookup is slow despite adding different types of indexes on a table. In a follow-up blog we will talk about partitioning existing tables in MySQL, so stay tuned