Your app is getting better. It has more features, more active users, and every day it collects more data. Your database is now causing the rest of your application to slow down. Database sharding might be the answer to your problems, but many people don't know what it is and, most importantly, when to use it. In this article, we'll talk about what database sharding is, how it works, and the best ways to use it.

Before we get into that question, it's essential to understand why we shard data stores and the various options you have before you embark on sharding.

When tables get to a specific size, people often feel sharding is a magical solution to all scaling issues. However, I have had tables with billions of rows and didn't see a compelling reason to shard since my usage pattern lent itself well to a single table and didn't see any strong reasons (other than management of such a large table, which in some cases is reason enough) to shard the table.

What is it to shard a database?

Simply put, sharding is a method for distributing data across multiple machines. Sharding becomes especially handy when no single machine can handle the expected workload.

Sharding is an example of horizontal scaling, while vertical scaling is an example of just getting larger and larger machines to support the new workload.

Horizontal Scaling
Horizontal Scaling

Engineers often get caught up in doing things the most involved way, but keeping things simple early on makes challenging things later on as your application evolves much easier. So if your problem goes away by getting machines with more resources, 9/10, that's the correct answer.

Now that we have discussed the potential server architectures, let us talk about data layout.

You can also partition data in a few ways and move specific tables to their databases, much akin to what you see in microservice architectures, where a particular aspect of your application has its database server. The application is aware of where to look for each. Alternatively, you can store rows of the same table across multiple database nodes, which brings in ideas like sharding keys; more on that later.

Partition Strategies
Partition Strategies

More modern databases like Cassandra and others abstract that away from the application logic and its maintained at the database level.

This post is for paying subscribers only

Sign up now and upgrade your account to read the post and get access to the full library of posts for paying subscribers only.

Sign up now Already have an account? Sign in