There is no shortage of material devoted to discussing cloud-based computing and praising its benefits. Unfortunately as the term “cloud” continues to become more ubiquitous, it’s easy to get the impression that there isn’t anything you can’t do on the cloud. Over the past two years hardware and software vendors have been tripping over themselves trying to reposition products as being closely aligned with the cloud. In fact, I recently read about a manufacturer of fibre channel host bus adapters that was referring to their latest product as a “cloud ready” device. (Hint: it’s far more complicated than that.)
While it may be true that anything can run on the cloud, you need to consider if it would actually be a net benefit your company to run something on the cloud, because sometimes it’s not.
So, while I am still a believer in using the cloud and will undoubtedly continue to hear of novel new uses, I think now is a good time to highlight an application that would not immediately benefit from cloud-based computing.
Large, high volume relational databases
Much has been written lately about the benefit of Hadoop and noSQL based databases. These technologies are beginning to have a major impact on many enterprises as people look for new ways to analyze the huge amount of data that continue to be amassed by the “big data” proponents. However, while Hadoop and noSQL have been developed to run on cloud-based infrastructure, their predecessor, the traditional relational database, has not. The issues are many, but for illustrative purposes let’s consider the following.
IO / capacity imbalance: Technologists have spent the last 20+ years fine-tuning relational databases to support increasing transaction volumes. This includes making subtle changes to the way databases store data, as well as more brute-force activities such as allocating disk drives purely for I/O per second (IOPS) purposes. The result is that these storage devices have very low utilization rates because their throughput is saturated long before the drives themselves are full. The net effect is a large amount of stranded storage capacity. It is important to understand that in the typical cloud provider today, this unused capacity is still considered utilized by the customer and is a chargeable item.
We are beginning to see the arrival of large-scale solid-state storage with incredible performance thresholds but they will not be widely adopted until the technology has been proven and the current platform refresh cycle has run its course. Furthermore, it will be some time before solid-state storage becomes a standard component of the typical cloud infrastructure service provider.
Multi-tenancy load imbalance: True cloud-based infrastructure is predicated on the idea that multiple applications (tenants) co-exist on shared infrastructure in a way that isolates the workload without requiring the allocation of dedicated assets. In general this concept works brilliantly with the majority of cloud workload because there is a natural balance between the storage and processing and in aggregate no application competes unfairly with the others for the resources available in the cloud infrastructure. However, relational databases can have very high rates of I/O which compete with the other tenants. This competition can have significant impact on the overall performance of the cloud infrastructure because it lowers cache effectiveness for all tenant applications.
I/O Fees: Many cloud service providers today include services charges for I/O between virtual machines and these charges can be very excessive for traditional relational databases due to sheer volume of storage IOPs that must be performed to complete a typical database query.
The more likely possibility is that applications that rely on standard relational databases will be rewritten to take advantage of new database technologies instead of being moved as-is directly onto cloud-based infrastructure.