Rationalism? The Reality of Multi-Cloud
Can René Descartes help you with your multi-cloud strategy?
The French philosopher René Descartes (1596–1650), set himself the task of determining how certainty could be established. He believed that we should:
Systematically doubt everything that could be doubted, including the most common beliefs.
In order to illustrate his approach, Descartes used the analogy of a basket full of apples, some of which might be rotten. As you know the rot from one bad apple can easily spread, and therefore it’s important to get rid of the bad ones in order to preserve the health of the rest. The way to do it is not to look at each apple in turn, but rather to empty the basket and then return only the apples you have no doubts about.
There is a strong buzz around multi-cloud. Articles are written, podcasts are recorded, and vendors are pumping the hype while trying to sell us sophisticated tools for building, deploying, and managing our applications across multiple clouds. Their message is:
Multi-cloud is the next normal. Everyone is going multi-cloud. You don’t want to stay behind, stuck with your single cloud provider
Right or wrong? In this article, I will try applying Descartes’s method to the multi-cloud hype. We are going to empty the basket that holds the common reasons for going multi-cloud and check which one is fresh and which one is rotten.
Types of Multi-Cloud implementations
Multi-Cloud means different things to different people. For me, as head of engineering in a fast-growing SaaS company (WalkMe), multi-cloud is all about:
using services offered by the different public cloud providers for building and operating our product in the most reliable, performant, and cost-effective way.
When it comes to cloud strategy, I find that there are 4 strategies that can be applied to a product like ours:
Strategy 1: Single Cloud For All Jobs
The entire product (all services) is built & deployed on a single cloud (AWS).
Strategy 2: Right Cloud For The Job
Some product services are deployed on AWS, some other services on GCP, some other services on Azure, etc. The key idea here is that while we are running on multiple clouds, every service we have is always mapped to the same single cloud provider.
Strategy 3: Some Jobs By Multi Clouds
The entire product (all services) is deployed on AWS, but some services also run on GCP and/or Azure. For the services that run on multiple clouds, there is a workload distribution mechanism that determines which incoming traffic and which workload is handled by which cloud.
Strategy 4: All Jobs By Multi Clouds
The entire product (all services) is deployed on AWS, as well as on GCP and/or Azure. In this scenario, we have completely separate deployments of the entire product on multiple clouds, and every customer can theoretically choose whether he wants to be served by the AWS/GCP/Azure farm.
Rational vs Reality
Now that we have set the context, we are ready to examine the apples. Let’s pick up the 6 most common reasons for going multi-cloud. For each reason, let’s compare and rate the rational versus the reality, doubting everything that can be doubted.
1. Cloud Outage Risk
- Rational: Public clouds have outages. We have zero control over how frequent these outages are going to be and over how long it will take the cloud provider to bring the service back up. If we want to ensure our business continuity, we can’t rely on a single cloud provider.
- Reality: Public cloud providers have a concept of regions and availability zones. Within a region, each AZ is isolated in the sense that it has dedicated network connections and power backups. By deploying cross AZ clusters of our microservices we are significantly reducing the probability of being impacted by a cloud outage. I am not claiming that cross-AZ level and even cross-region level outages never happen. Although being rare, these outages still occur. Theoretically, we would want our system to function even on the occurrence of these rare events. The reason I used the word “theoretically” is that in order to achieve this kind of continuity, our system needs to be designed for it. We would actually need to build what we earlier called “All jobs by multi-clouds”. Moreover, we would need to design for full workload portability so that once AWS is down (for example) we would be able to immediately route all traffic & workloads to Azure. The problem is that building real products that support cross-cloud workload portability is practically unrealistic.
- Fresh 2 Rotten Rating: 🍎 🍎 💩 💩 💩
2. Global Presence
- Rational: Our servers and our data need to be as close as possible to our end users. This is required for performance reasons as network latency impacts the end-user experience, as well as for compliance reasons as some customers can’t have their sensitive data stored outside of their country. We have customers around the globe, so a single cloud provider won’t be able to provide us with all the physical locations that we need.
- Reality: The major cloud providers now have pretty much the same global presence. AWS has 24 regions with 77 AZs, GCP has 24 regions with 73 AZs, and Azure has 54 regions (but many of them with a single AZ). That doesn’t take into account the content distribution network (CDN) which helps to address potential latency issues for end-users in countries that are remote from the main regions. The cloud providers are aware of the potential market size as well as the compliance regulations in different countries (specifically data residency regulations). This means that in most cases, you will find that all 3 major providers have a region in countries with significant market size and with specific regulations. The only situation in which I remember myself considering to use a different cloud provider because of global presence was early 2015 when I had a large Canadian customer not willing to have their data stored in the US (after the Edward Snowden thing). We were on AWS and they didn’t have a region in Canada. Microsoft didn’t have an “official” region as well, but they did have some kind of partner-operated Azure data center. We started planning how to deploy and operate our service, only to find out that some of the major Azure services were not supported in that region. Six months later AWS started their “Canada Central” region, and right now all 3 major vendors have an official region in Canada.
- Fresh 2 Rotten Rating: 🍎 💩 💩 💩 💩
3. Vendor Lock-In
- Rational: We can’t put all our eggs in one basket! Being completely dependant on a single vendor that provides a service that is critical to our business is dangerous or even irresponsible. What happens if the vendor we are locked to goes out of business? or worse, what happens if the vendor we are locked to acquires our top competitor?
- Reality: AWS, Azure, and GCP are all great businesses. It’s very unlikely that Amazon, Microsoft, and Google will shut them down. It’s also very unlikely that a cloud provider will take any kind of a cheap shot at you, even in the case that they become your competitor. The best example I can think of is Netflix, running smoothly on AWS although Amazon prime video is now a direct competitor. More pragmatically: avoiding vendor lock-in is a nice statement, but it comes with significant technical restrictions. It practically means is that your product can’t rely on a certain cloud service unless it is offered by all cloud vendors. You are simply swapping being locked to a vendor with being locked to the lowest common denominator of multiple vendors. Not a great idea in terms of your ability to innovate and move fast.
- Fresh 2 Rotten Rating: 🍎 💩 💩 💩 💩
4. Cost Optimization
- Rational: If we can only run our service on a single cloud, there is no reason for the cloud provider to give us any discount. Multi-cloud will allow us to continuously move our workloads to the cloud provider who offers us the best price. This will become leverage for negotiating better prices with all vendors.
- Reality: It simply doesn’t work like that. The idea of cloud price arbitrage works only in theory. In reality, the most significant cost optimization actions you can take depend on your ability to predict your future usage on a certain cloud and reserve capacity accordingly. From a discount negotiation perspective, the level of discount you can get from a cloud vendor mostly depends on your volume (total spend) and on your ability to pre-commit (minimum spend) for at least 1 year. By splitting your overall consumption among multiple cloud vendors you are naturally decreasing volume per vendor, as well as decreasing your ability to predict future use and commit to a minimum spend with each cloud vendor. In reality, it’s not only that you aren’t going to be able to cut your total cloud hosting costs, you are actually going to pay more. Much more. As an additional punishment, you are also going to get multiple cloud bills every month. I know how much you enjoy reading them, so why just have this great fun only once if you can get more of it?
- Fresh 2 Rotten Rating: 💩 💩 💩 💩 💩
5. Best of Breed
- Rational: Some cloud providers offer unique services that don’t exist on other clouds. We strongly believe in “use the best tool for the job”, and by using these services we would be able to improve the reliability and performance of specific areas in our product.
- Reality: It’s 100% true, but then there is the complexity tradeoff. Indeed, there are specific services that are only available in one cloud. For some use cases, being able to take advantage of these services can be a game-changer. An example that comes to mind is Google’s BigQuery which is in a different league from the data warehouse solutions offered by the other cloud providers. But as always, there is a tradeoff. Running different microservices on different clouds introduces two types of overheads: networking overhead & workflow overhead. Let’s start with networking: communication between services running on different clouds is slower and more expensive than communication between services running on the same cloud. Network latency and data egress costs are so high that they might be a show stopper for some use cases. As for the workflow overhead: running different parts of your product on different clouds means you need to build and maintain different workflows. There are a bunch of vendors (Red Had, VMware, Hashicorp, etc.) and technologies (containers, Kubernetes, Anthos, Terraform, etc.) that may help, but at the end of the day, there is no way to completely hide the differences among clouds in areas like IAM, networking, security, deployment, monitoring, elasticity, compliance, and many others. Your DevOps & SRE people will have to build expertise in multiple clouds. Some of them might not love it.
- Fresh 2 Rotten Rating: 🍎 🍎 🍎 🍎 💩
6. Commercial Incentives
- Rational: The major cloud providers are among the largest and fastest-growing companies in the world. We want them to be our customers, or even better, be our partners. They are reluctant to partner with us as long as our product is hosted on their competitor’s cloud. The same logic applies to other large customers that aren’t a cloud provider, but consider the cloud provider we are hosted on as their direct competitor (e.g. large retail companies not willing to use products hosted on AWS). For us to be able to win these massive deals and create strategic partnerships, we can not be limited to running on a single cloud.
- Reality: This is real. I admit that the first time it happened to me I was a bit surprised or even skeptical (“if they really need our product, they shouldn’t care if it’s running on AWS”), but after seeing the exact same thing happening in two different companies, and from different potential customers, I realized it’s 100% real. I am now considering this pressure as a positive sign: if a company like Microsoft/Google/Amazon/IBM/Oracle insists that we will run our services on their cloud, it must mean they believe we have a chance to grow our business and become a large customer. Otherwise, why would they bother? anyhow, the best way to handle this is to try to push toward strategy #2 (best cloud for the job) and embrace a best of breed approach. This way we are both leveraging the strengths of each provider, as well as spending enough with each vendor to keep everyone satisfied. Another small piece of advice I have is to ask for credits. The cloud vendors have dedicated credit budgets for encouraging hyper-growth SaaS companies to move to their platform.
- Fresh 2 Rotten Rating: 🍎 🍎 🍎 🍎 🍎
Conclusion
Descartes thought us to doubt everything that can be doubted. When it comes to technology hypes like multi-cloud, we need to be extra careful. My advice is to:
break down the overly hyped message into discrete value points and examine each value point in terms of its ROI and risk in the context of your product and your team.
I believe that for most products and teams, the result of such analysis would be opting for either strategy #1 (single cloud for all jobs) or strategy #2 (right cloud for the job). If you are one of the special cases, where the analysis led you to go with strategy #3 (some jobs by multi-clouds) or god forbid with strategy #4 (all jobs by multi-clouds), I will wish you the best of luck and would be happy to hear how it worked out for you in the real world.