Understanding the huge importance of the response time is crucial. The purpose of a software application isn’t just to do what it is meant to do, but also to do it fast and to scale by keeping an eye on the response time and, implicitly, on the throughput.

There are studies that show that 47% of consumers expect a web page to load in 2 seconds or less and 40% abandon a website that takes more than 3 seconds to load but also that reducing the load time from 8 to 2 seconds results in a 74% increase in conversions. This data is extremely important, so designing a software that scales is a must.

Don’t Forget the Throughput

Let’s say that you’ve built a fast application and you run it on servers with very powerful hardware, and your code is optimal and respects all the best practices. Having all of these allows your app to have a very good response time and to be performant. However, the true test comes in live conditions. Making it fast and having a very good response time is not enough if it doesn’t scale. We should also have in mind the income load of the application. So, besides the response time to a single request, another important measure is the throughput.

We have two main units of measure for the software performance:

  • the response time
  • the throughput

The response time represents the amount of time necessary to complete a single action and to respond to the request. The throughput is the number of transactions completed within a time interval. If the income load is increasing, this will imply a high number of requests in an amount of time, so for keeping the same response time, our software must be capable to respond to more requests in the same amount of time, meaning increasing its throughput. Scaling actually means that your software capacity should increase as the incoming load is rising for keeping the same response time, making the throughput the measure of scalability.

In this article, I want to focus on some scaling methods and to see how Amazon can help us implement them more easily.

How Do We Scale?

Traditionally, there are two types of scaling:

  • Vertical Scaling
  • Horizontal Scaling

Vertical Scaling

Vertical scaling means increasing the hardware capacity of the machine on which the software runs. This is the traditional approach and the most common one. The hardware capacity has followed the Moore’s Law, and every time the response time decreased due to the income load, improving the hardware capacity seemed to be a sufficient approach, and it was a successful approach as the hardware capacity increased over time at a good price.

Even if the scaling does not depend on the cloud computing, I want to mention how the cloud computing, and more exactly AWS, helps us with it. Buying and maintaining your own hardware infrastructure and increasing its capacity over time isn’t a cheap operation. It is more convenient to benefit from more CPU power, storage, memory, security, etc. instead of increasing your hardware capacity every time the throughput decreases. This is just one benefit from using cloud computing.

For example, hosting your database in the AWS cloud as a Amazon RDS Instance makes the vertical scaling just one click away. You could choose from 18 instance sizes when resizing your RDS MySQL, PostgreSQL, MariaDB, Oracle, or Microsoft SQL Server instance. You just need to select what instance size your database needs, and that’s it. AWS will handle the rest. There is no need to buy new hardware components, install them, update the security, etc. as AWS will do all of this for you at a more convenient price.

Horizontal Scaling

Using just one single machine, as powerful as it may be, still has its limitations. It is cheaper to use more, let’s say, average machines than a single one that is super performant to obtain the same computing power. That’s also available for AWS instances. Even if Amazon gives you extremely powerful instances, from the point above, you cannot expand vertically anymore. The performance of a single machine is limited, but the number of machines that you can use isn’t. So theoretically, by using more instances, you can get, with the afferent costs, an unlimited amount of computing power. Distributing the income load to more than one machine is called horizontal scaling.

Everything sounds good until now, but the horizontal scaling implies a little more effort than using just one single instance. A distributed system always raises extra difficulties because you have to ensure the data consistency no matter how many machines are running in parallel. However, as always, there are some best practices that make this process simpler.

Master-Slave Replication

If you have more database instances with the same data, do all the read requests go to one single instance? Well, no! If all the instances contain the same data, the read requests can be equally distributed to all of them, this way, we increase the throughput and, implicitly, the response time.

But what about the write requests? Well, if your system’s read/write ratio is high, then you can use one single instance for writing, making it a single source of truth, and more read-only instances that replicates the data from it. This is called master-slave replication. In this type of architecture, the data is replicated asynchronously from the master to the other databases, making the writings and readings non-blocking.

master-slave-replica-1.png

Amazon does this type of replication with RDS read-replicas. Besides increasing the read capacity, read replicas are also useful for increasing availability also. All data is replicated asynchronously to the read-replicas instances, and makes them good candidates to be promoted as a master in case the master instance fails. So, even if this isn’t their main role, they aren’t 100% consistent due to the asynchronous replication, and they provide a complimentary availability mechanism to the Multi-AZdeployments.

Multi-AZ deployments provide a complete availability and durability solution for your RDS instances. The main difference between them and the read-replicas is that the data is replicated synchronous, so the Multi-AZ instance is 100% consistent with the master. In the case of a failover of the master instance, AWS promotes the Multi-AZ database as a master automatically, while the read-replica is promoted manually.

Multi-Master Replication

What should we do if the read/write ratio is not too high and one single database instance isn’t enough for all the writes? It would be better if we could write on more than one instance, isn’t it? There is a type of architecture called multi-master replication. Here, all the instances are masters and accept write requests. Distributing the writes clearly sounds pretty good as there could be overall more write connections. Another advantage is that there is no single point of failure, and if one instance fails, you can write to another instance.

However, not everything is as good as it sounds. We don’t have a single point of failure, but we don’t have a single source of truth either. Using more instances for writing brings a real challenge for data consistency. The data should be replicated to all instances in such a manner that all writing conflicts should be avoided. If two users try to change the same piece of data on two different databases, how will the system solve this conflict? Having this type of architecture raises some hard synchronicity challenges.

There are some solutions here like the two-phase commit protocol, but this doesn’t help us improve the throughput because this algorithm has to synchronize all the writes from all the instances. This is not much better than having just one single master instance.

Sharding

All these problems are related to the number of write and read requests addressed to a database instance. If there isn’t a big number of writings or readings, then there will be no problem with the throughput or with the response time. Could we do something like this? The answer is yes. We can split the data through multiple database instances.

Let’s say that we have a user table. Could we split the users through multiple databases while using the same schema? Well, if we have a way to find which instance has located the data about a specific user, then why not?

The sharding has two key points:

  • To store together data that is used together (like having a smaller single consistent database)
  • To identify the instance where some data is stored

The data could be split by different criteria. For example, we can split the data according to geographic region. Now, all the users from a specific region have the necessary data on one single instance. This could be useful if some regions are busier than others. In that case, we can use a more performant machine for that busy region, and on the other regions, we can use cheaper instances and reduce the costs.

For an even splitting, we can use the hash function. All the data with the same hash sum will be stored on a single instance (exactly like a hash map). The hash could be computed based on some identifier, so if you want to search for some information, you will know which instance to look on by computing the identifier hash sum.

The communication between different instances isn’t entirely forbidden. We should not have synchronous communication between two different instances, but we can have some common small tables with data that is not frequently changed. Those tables could be asynchronously replicated to all the instances, ensuring the consistency between all databases.

sharding

Scale the Shards

Sharding the data between multiple instances will result in multiple smaller independent instances, reducing the read/write income load and increasing the throughput. However, we don’t have to stop here. Instead of one huge instance, now we have multiple smaller ones. So, why not scale them too? We can use the master-slave replication on the smaller databases to improve even more of the scalability.

Sharding gives us multiple different instances. Being separated has its advantages because we can use different machines for each shard, and we can also continue to scale a more busy shard. If we shard accordingly to the geographic areas, like in a busy area, we can invest more on that shard and scale it. We can vertically scale it by buying a more expensive hardware for that shard and horizontally scale it by using a master-slave replication architecture.

Amazon Does It

Amazon does it, but not directly. It isn’t something they offer you directly, but given the fact that using AWS RDS makes the database administration more easy, it is also pretty easy to create shards.

We can start from a read-replica. The read-replicas contain all the data that is in the master database and we saw that the read replica is promoted as a master in the case of a failover, but this is not the only time when the read replica could be promoted as a master. In fact, we can promote the read replica as a master while keeping the current master up and then cut from it the data that belongs to the other shard. Then we would have two shards. This is the basic idea. Having shards could imply some more configurations, but that depends on your app specifications.

Virtual Shards

I have always been a fan of loosely coupling, and there is a method to couple a server app to multiple shards in a loose way. Hibernate introduced the concept of virtual shards. The idea is simple, and it focuses on decoupling the server app from the physical shards.

Maybe, at first, we don’t need too many shards. If the app hasn’t a big load in the present, why should we invest in multiple physical shards when we don’t need to? Maybe we can use just one database instance, but we should have in mind the future growth by designing the software as it is using more shards. With hibernate, we actually use more shards, but virtually. This way, in the future, when we decide that one physical instance isn’t enough anymore, it is easier to add more shards because the physical shards are decoupled from the app and we need to change just some hibernate config files.

Explaining how it works is a bit out of the scope of this article, but here and here you can find more details about virtual sharding. The basic idea is that your app is connecting to some virtual DB instances and those virtual DB instances are further connected by hibernate to some physical shards.

Conclusion

People are very reactive to the response time and this is one of the most important points of the decision of using, or not using, a software application. When we design and implement a software component, we should always have in mind the future income load to the application. It is not enough for your app to do its job if it doesn’t scale up when necessary and has a large response time.

Decreasing the response time could be achieved by scaling vertically (increasing the hardware capacity) and horizontally (distributing the load to multiple instances). Whatever option you choose sounds pretty expensive, but using the cloud computing services can significantly reduce the costs and the work effort because the scaling responsibility is handled by the cloud services and you can focus more on your business logic.