Serverless as a concept has been around for some time now. It has been applied to making applications serverless, especially stateless applications. This has allowed applications to scale and handle peak load and at the same time paying only for the duration of usage of computing resources. It is natural to think of having the same serverless concept for databases too.
Most production database api are the keystone to successfully running businesses and have been the technology component where investment is high just to keep it running smoothly.
As we see more applications running in the cloud and serving millions of users, the challenge to scale databases and the cost incurred to operate such databases in the cloud is becoming a significant investment.
Most monolithic database cloud e.g. MySQL, PostgresSQL are a compromise between getting true cloud scale in a pay as you use model and managing, maintaining and operating the databases in on premises data centers.
AWS RDS is a good example of managed databases in cloud where the administrative and operational tasks have been taken care of, however scaling such databases will be costly since these databases can be easily scaled vertically but scaling them horizontally will need an architecture change in most cases. Vertical scaling leads to increase in cost, especially when the computing and storage resources are not in use.
Can databases operate in true serverless mode?
That’s a question that needs to be answered to run a cost effective business in the cloud and at the same time not wasting valuable computing resources by reserving them and not using them.
Here are some of the key requirements to operate a database in serverless mode:
- Auto scale
- Pay as you use model
- Simple usage
- Resilient to failures
- High performance
- Multi zone high availability
In the endeavor to identify a serverless database that can fulfill the above requirements, we looked closely at the architecture of CockroachDB serverless, which represents the futuristic architecture of an ACID compliant RDBMS in the cloud in serverless mode.
CockroachDB serverless
In any RDBMS the heavy compute usage components are the query execution engine and the storage engine.
Since most RDBMS persist data in disks and at any point in time a small percentage of that can be cached in the server memory, there are constant IO operations to read data blocks from the disk.
Once the data is loaded into the query engine there are heavy CPU intensive operations and memory requirements to process the data and serve it at high concurrency.
This sort of indicates that the SQL execution and Storage layers in a serverless database architecture have to scale horizontally and should have fault tolerance mechanisms. Also since multiple tenants operate in a shared infrastructure in the cloud. It is important to have physically isolated services for each tenant in the SQL execution and storage layers.
Here is the current serverless architecture of CockroachDB:
Each SQL Pod is owned by a single tenant at execution time and multiple such SQL Pods can be dedicated to a tenant to handle high concurrency. Network rules ensure each tenant’s SQL pods can only communicate with other SQL pods belonging to the same tenant. This allows the SQL layer to scale up and down as the query execution requirements increase or decrease.
The SQL pods communicate with the Storage pods which are again dedicated to each tenant. The Storage pods abstract data storage in shared block storage systems like AWS EBS or GCP PD. The Storage pods ensure data for each tenant is stored and managed separately.
Kubernetes is used to manage the SQL and Storage clusters including shared storage nodes and per tenant SQL nodes. Each node runs in its own K8s pod, which is not much more than a Docker container with a virtualized network and a bounded CPU and memory capacity. A Linux cgroup reliably limits the CPU and memory consumption for the processes.
This ensures each Serverless cluster can be created in seconds since a new pod has to be created in an existing virtual machine. Operating in this mode reduces the overall compute and storage costs.
Conclusion
Operating databases in serverless mode is a new concept and has to see wide adoption before it becomes a mainstream option to manage huge amounts of data at scale in a cloud environment. The promise to truly offer a pay as you use mode for databases is exciting and should encourage wide adoption of cloud computing for databases.
AWS has released a serverless option for Aurora called Amazon Aurora Serverless v2 in preview mode.
AWS has also released a severless option for Kafka called Amazon MSK.
AWS has outlined an approach to deploying and running machine learning models in serverless mode.
This goes to show the importance of operating highly critical components in serverless mode to benefit from an overall cost point of view and at the same time getting true cloud scale and resilience.
At Engati we evaluate new technologies for cloud computing when they become mature and adopt them if they fit into our overall architecture and help us to scale to increase our service levels for our customers.