There are a lot of documents and recommendations on how to build an architecture in the cloud, assuming the customer is located in a specific region, but what if we are an organization who provides information or services to customers all over the globe?
Example of such services — e-Commerce sites, streaming video services, news sites, gaming platforms, IoT services, etc.
In this article, I will try to map some of the considerations for choosing global services, which will enable us to build common architecture for customers all over the globe.
Background
When designing multi-region architecture, we need to take under consideration aspects such as deployment to multiple regions, ability to handle failure (or connectivity issues) in a specific region, ability to replicate data between remote geographic areas, ability to write/update data in a specific time interval over multiple geographic regions and the ability to deploy new application build or control the scale of an application in a simple manner, over multiple remote geographic areas (such as gradual application upgrades).
In certain scenarios (such as streaming media), we may wish to synchronize the same content to different areas in the world. In other scenarios (such as e-Commerce sites, news sites, etc.), we may wish to build similar architecture in different regions, while storing the data itself (such as product catalog, customer preferences, language, etc.) in the same geographic region as the customer.
When reviewing the requirement to build multi-region architecture, there are common reasons for such architecture:
Network latency between the customer and the cloud service:
- Storing data close to the customer, improves the customer experience Example of such scenario — Streaming media. In this scenario we wish to sync the same content to multiple geographic regions
Disaster recovery:
- Active-Active Site — An expensive solution, but enable us quick recovery from disaster, assuming we sync data in real-time (or near real-time) between multiple geographic regions
- Active-Passive Site — Enable us to recover from disaster, but requires data synchronization mechanism and manual update of DNS records between sites, and manual switching between database roles from replica to master role
Regulation or customer related requirements:
- The need to store customers’ data in a specific geographic region, according to regulation requirements (such as GDPR) Example of such scenario — e-Commerce site. In this scenario, we will build similar architecture in different geographic regions, but we will store customer related data (such as product catalog, customer preferences, etc.) in the same geographic region as the customer
Example of multi-region architecture
Network related aspects
When designing multi-region architecture, the first aspect we wish to review, from the customer’s point of view (browser, mobile device, IoT device, etc.), is the network aspect.
DNS services - Using these services, the customer accesses our infrastructure (or application) from the Internet.
Below are common DNS services for multi-region architecture:
- Amazon Route 53 — Globally distributed service, which enable us to configure resource name resolution rules based on geo-location
- Azure Traffic Manager — Globally distributed service, which enable us to redirect traffic based on DNS requests
- Google Cloud DNS — Globally distributed service, which enable us to configure resource name resolution rules
CDN (Content Delivery Network) services — Globally distributed network infrastructure, which enable our customers’ fast access to resources (using caching).
Below are common CDN services:
- Amazon CloudFront — Globally distributed CDN infrastructure, based on Edge Locations
- Azure CDN — Globally distributed CDN infrastructure (in certain countries, based on Akamai CDN infrastructure)
- Google Cloud CDN — Globally distributed CDN infrastructure, based on Google global network infrastructure
Defense against distributed denial of service (DDoS) and application (Layer 7) attacks — Since we are designing an infrastructure accessible and exposed from the Internet, we wish to protect our infrastructure and be able to integrate with other services (such as DNS, Load Balancing, etc.)
Below are common protection services:
- AWS Shield — Globally distributed DDoS protection service, with WAF (Web Application Firewall) capabilities
- Azure Front Door — Globally distributed DDoS protection service, with WAF (Web Application Firewall) capabilities
- Google Cloud Armor — Globally distributed DDoS protection service, with WAF (Web Application Firewall) capabilities
Load-Balancing services — Services that enable us to distribute the network load between different data centers or different geographic regions.
Below are common load-balancing services:
- Amazon Application Load Balancer — Layer 7 (application) load balancing service. Although this is a regional service, we can build global infrastructure, by redirecting DNS traffic from Amazon Route 53 to our regional Amazon ALB
- AWS Global Accelerator — Global service, which enable us to accelerate network traffic to our application, while using single global public IP address and supporting HTTP/HTTPS and TCP/UDP traffic
- Azure Front Door — Global load-balancing service, supporting HTTP/HTTPS traffic
- Google Cloud Load Balancing — Global load-balancing service, use single global public IP address and supporting HTTP/HTTPS, TCP/SSL and UDP traffic
File storage related aspects
As in any other system, most chances that we want to share static content (files) in multiple regions, whether it is the source origin from which we wish to share content using CDN services, configuration files, backups, etc.
Object storage services — This type of services enable us to store files for read and update.
Below are common object storage services:
- Amazon S3 — Managed object storage service. Although this is a regional service, we can replicate files between S3 buckets located in remote regions, using Cross Region replication feature
- Azure Blob Storage — Managed object storage service. Although this is a regional service, we can replicate files between blob storage located in remote regions, using Geo Redundant Storage or Geo Zone Redundant Storage features
- Google Cloud Storage — Globally managed object storage service, allowing us to store and replicate files automatically between remote regions
Database storage related aspects
Almost every system that exists today contains various types of databases for storing and querying data.
Relational Databases — Databases for working with structured data and a clearly defined schema Common relational database services:
- Amazon Aurora — Managed database service, based on MySQL or PostgreSQL engine. Using a feature called Amazon Aurora Global Database, we can build global database between remote regions
- Azure SQL Database — Managed database service, based on MS-SQL engine. Although this is a regional service, we can build asynchronous data replication process between remote regions, using Active Geo-Replication and Automatic Asynchronous Replication features
- Google Cloud Spanner — Globally managed database service, which enable us to replicate data (read/write mode) between remote regions
NoSQL / Non-Relational Databases — Databases for storing large amount of non-structured data Common NoSQL database services:
- Amazon DynamoDB — Managed NoSQL database service. Use a feature called Global Tables, we can build data replication process (read/write mode) between remote regions
- Azure Cosmos DB — Globally managed NoSQL database service, which enable us to data replication process (read/write mode) between remote regions
- Google Cloud BigTable — Globally managed NoSQL database service, which enable us to data replication process (read/write mode) between remote regions
Cost aspects
When designing multi-region architecture, we need to consider cost aspects, such as:
- Service cost — In many scenarios mentioned in this article, global solution requires an expensive premium license
- Egress (outbound) traffic cost — Cross region replication and inter-region traffic has its own cost model for each cloud provider
Operational aspects
When designing multi-region architecture, we need to consider operational aspects, such as:
- DevOps, application deployment and upgrades — Ability to perform gradual application deployment or upgrades over multiple remote regions
- Source / Configuration registry — The need to build central configuration / container registry for storing configuration, container images and any other type of data required to be synched between multiple remote regions around the globe
- Monitoring — The requirement for constant monitoring of multiple services (from availability, through resource usage, scale, etc.), other remote regions
- Data integrity / consistency — The ability to make sure data is synched and stored consistent between multiple remote regions
- Availability — The ability to monitor service availability over multiple remote regions
- Disaster recovery — The ability to conduct disaster recovery drills between remote regions, including failover and redirection of customer traffic between regions
- Security and Governance — The ability to enforce access policies and security configurations over multiple regions
- Regulation compliance — The ability to maintain global infrastructure, while complying with local regulation and privacy in certain parts of the world (such as the GDPR)
Summary
In this article, I have reviewed many aspects and consequences of designing and building multi-region architecture, which enables organizations to scale and to provide better customer service, by their origin.
It is important to remember, there is no instant architecture, which fits all organizations and all types of systems, and for each scenario we need to make the proper adjustments and choose the most appropriate service (whether managed service or not). The list of services mentioned in this article, will allow you to review your alternatives.
We need to take under consideration that multi-region (or global) architecture is just the mean and not the goal itself. Building and designing global infrastructure is expensive and requires considerations beyond the technical side — new monitoring services, different monitoring capabilities, effect of development process, etc.
Additional references:
- The quest for availability in the cloud
https://read.acloud.guru/the-quest-for-availability-771fa8a94a7c
- How to build a multi-region active-active architecture on AWS
- Build a serverless multi-region, active-active backend solution in an hour
https://read.acloud.guru/building-a-serverless-multi-region-active-active-backend-36f28bed4ecf
- Build a serverless multi-region, active-active backend solution — within a VPC
- Multi-region serverless backend — reloaded
https://medium.com/@adhorn/multi-region-serverless-backend-reloaded-1b887bc615c0
- Architecting Multi-Region SaaS Solutions on AWS
https://aws.amazon.com/blogs/apn/architecting-multi-region-saas-solutions-on-aws/
- Run a web application in multiple Azure regions for high availability
- Run an N-tier application in multiple Azure regions for high availability
- Choosing the right architecture for global data distribution, based on GCP
https://cloud.google.com/solutions/architecture/global-data-distribution
- Architecture: Scalable commerce workloads using microservices, based on GCP
https://cloud.google.com/solutions/architecture/scaling-commerce-workloads-architecture
- Choosing the right load balancer in Google Cloud
https://medium.com/google-cloud/choosing-the-right-load-balancer-9ec909148a85
- Spanning the Globe without Google Spanner
https://medium.com/yugabyte/spanning-the-globe-without-google-spanner-c7c8683dac65
- My cheat sheet for choosing a database on GCP
https://medium.com/@hello_92179/my-cheat-sheet-for-choosing-the-right-database-on-gcp-d0f3fe8c2360