SAABlog
NetworkingIntermediate

Multi-AZ vs Multi-Region: AWS High Availability and Disaster Recovery Guide

Compare Multi-AZ and Multi-Region architectures on AWS. Learn when to use Pilot Light, Warm Standby, or Active-Active DR strategies based on RTO/RPO requirements.

PHILOLAMB-Updated: January 31, 2026
Multi-AZMulti-RegionHigh AvailabilityDisaster RecoveryDRPilot LightWarm Standby

Related Exam Domains

  • Domain 2: Design Resilient Architectures

Key Takeaway

Multi-AZ is for High Availability (HA), Multi-Region is for Disaster Recovery (DR). Use Multi-AZ for AZ failures; use Multi-Region for regional failures or geographic requirements.

Quick Comparison

AspectMulti-AZMulti-Region
PurposeHigh AvailabilityDisaster Recovery
ProtectionAZ failureRegion failure, natural disasters
Failover TimeAutomatic, 1-2 minManual/Auto, minutes to hours
ReplicationSynchronousAsynchronous
LatencyMillisecondsTens to hundreds of ms
Cost~2x baseline2x+ baseline
ComplexityLowHigh

Exam Tip

Exam Essential: "AZ failure protection" = Multi-AZ. "Region failure" or "natural disaster recovery" = Multi-Region. Consider cost and complexity trade-offs.

Multi-AZ Architecture

What is Multi-AZ?

Multi-AZ distributes resources across multiple Availability Zones within a single Region.

┌─────────────────────────────────────────────────┐
│                  Seoul Region                    │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│  │    AZ-a     │  │    AZ-b     │  │    AZ-c     │
│  │  ┌───────┐  │  │  ┌───────┐  │  │  ┌───────┐  │
│  │  │  EC2  │  │  │  │  EC2  │  │  │  │  EC2  │  │
│  │  └───────┘  │  │  └───────┘  │  │  └───────┘  │
│  │  ┌───────┐  │  │  ┌───────┐  │  │             │
│  │  │  RDS  │←─┼──┼─→│Standby│  │  │             │
│  │  │Primary│  │  │  │  RDS  │  │  │             │
│  │  └───────┘  │  │  └───────┘  │  │             │
│  └─────────────┘  └─────────────┘  └─────────────┘
└─────────────────────────────────────────────────┘

Multi-AZ Features

FeatureDescription
Synchronous replicationReal-time data sync between Primary and Standby
Automatic failoverAutomatic switch to Standby on failure detection
Single endpointDNS name unchanged, only IP changes
Same regionLow latency, simple architecture

Multi-AZ Supported Services

  • ✅ RDS (automatic failover)
  • ✅ ElastiCache (Redis cluster mode)
  • ✅ EFS (Multi-AZ by default)
  • ✅ Aurora (Multi-AZ by default)
  • ✅ OpenSearch (Multi-AZ deployment)
  • ✅ MSK (Multi-AZ recommended)

Multi-Region Architecture

What is Multi-Region?

Multi-Region distributes infrastructure across multiple AWS Regions.

┌──────────────────────┐         ┌──────────────────────┐
│    Seoul Region      │         │    Tokyo Region      │
│   (ap-northeast-2)   │         │   (ap-northeast-1)   │
│  ┌────────────────┐  │         │  ┌────────────────┐  │
│  │   Primary DB   │──┼─ Repl ──┼─→│   Replica DB   │  │
│  │   (Aurora)     │  │         │  │   (Aurora)     │  │
│  └────────────────┘  │         │  └────────────────┘  │
│  ┌────────────────┐  │         │  ┌────────────────┐  │
│  │  Application   │  │         │  │  Application   │  │
│  │    Servers     │  │         │  │   (Standby)    │  │
│  └────────────────┘  │         │  └────────────────┘  │
└──────────────────────┘         └──────────────────────┘
              │                            │
              └──────────┬─────────────────┘
                         │
                  ┌──────┴──────┐
                  │  Route 53   │
                  │  (Failover) │
                  └─────────────┘

Multi-Region Supported Services

  • ✅ Aurora Global Database (under 1 second replication)
  • ✅ DynamoDB Global Tables (millisecond replication)
  • ✅ S3 Cross-Region Replication (CRR)
  • ✅ Route 53 (Global DNS)
  • ✅ CloudFront (Global CDN)
  • ✅ Global Accelerator (Global network)

DR Strategy Comparison

Understanding RTO and RPO

MetricDefinitionQuestion
RPORecovery Point Objective"How much data can we afford to lose?"
RTORecovery Time Objective"How quickly must we recover?"
        Disaster Occurs
            │
            ▼
──────┬─────┼─────┬──────────────────→ Time
      │     │     │
      │◄───►│     │◄──────────────►│
      │ RPO │     │      RTO       │
      │     │     │                │
   Last   Disaster  Recovery     Service
   Backup            Starts      Restored

Exam Tip

Exam Tip: If RPO must be near "0", you need synchronous replication. If RTO must be near "0", you need Hot Standby or Active-Active.

The Four DR Strategies

1. Backup and Restore

The most basic DR strategy—periodically backup data and restore during disasters.

MetricValue
RTOHours to 24 hours
RPOHours (since last backup)
Cost💰 (Lowest)
Complexity⭐ (Simplest)

Best for: Non-critical systems, dev/test environments, cost-sensitive workloads

2. Pilot Light

Keep only core systems running minimally in DR region; provision the rest during disaster.

MetricValue
RTOMinutes to hours
RPOMinutes (async replication lag)
Cost💰💰 (Low-Medium)
Complexity⭐⭐ (Medium)

Best for: Core business systems, balanced cost vs recovery time

3. Warm Standby

Run a scaled-down version of the full system in DR region.

MetricValue
RTOMinutes
RPOSeconds to minutes
Cost💰💰💰 (Medium-High)
Complexity⭐⭐⭐ (High)

Best for: Critical systems requiring fast recovery, some downtime acceptable

4. Multi-Site Active-Active

Both regions actively handle traffic simultaneously.

MetricValue
RTONear-zero
RPONear-zero
Cost💰💰💰💰 (Highest)
Complexity⭐⭐⭐⭐ (Highest)

Best for: Mission-critical systems, zero downtime tolerance, global users

DR Strategy Summary

StrategyRTORPOCostAutomation
Backup/Restore24h+Hours$Manual
Pilot LightHoursMinutes$$Semi-auto
Warm StandbyMinutesSeconds$$$Auto
Active-Active~0~0$$$$Fully auto

AWS Services by DR Strategy

StrategyDatabaseComputeNetworking
Backup/RestoreRDS Snapshots + S3 CRRAMI CopyRoute 53 manual
Pilot LightAurora Global (replica only)EC2 AMIs (stopped)Route 53 Failover
Warm StandbyAurora Global (scaled down)EC2 minimal runningRoute 53 Failover
Active-ActiveDynamoDB Global TablesFull scale both sidesRoute 53 Latency/Weighted

Exam Tip

Exam Essential: When given RTO/RPO requirements, match them to the appropriate DR strategy. "Minimize downtime" = Active-Active. "Minimize cost" = Backup & Restore.

SAA-C03 Exam Focus Points

  1. RTO/RPO-based DR selection: Match requirements to strategy
  2. Cost vs Availability: Lower RTO/RPO = Higher cost
  3. Service capabilities: RDS Multi-AZ = sync replication; Aurora Global = async
  4. Route 53 role: Health Check + Failover for DR; Latency routing for Active-Active

Key Memorization

KeywordAssociation
AZ failure protectionMulti-AZ
Region failure protectionMulti-Region
Synchronous replicationMulti-AZ, RPO=0
Asynchronous replicationMulti-Region, RPO>0
Lowest cost DRBackup and Restore
Lowest RTOActive-Active
Core only runningPilot Light
Scaled-down operationWarm Standby

Exam Tip

Sample Exam Question: "A company requires RPO of 5 minutes and RTO of 1 hour for their mission-critical application. Which DR strategy should they implement?" → Answer: Warm Standby (Provides minutes-level RPO with async replication and minutes-level RTO with pre-running infrastructure)

Frequently Asked Questions

Q: Is Multi-AZ enough for disaster recovery?

Multi-AZ protects against AZ failures but not regional failures (natural disasters, widespread outages). For business continuity, consider Multi-Region.

Q: How do you handle data conflicts in Active-Active?

DynamoDB Global Tables uses "last writer wins" conflict resolution. For finer control, implement application-level conflict resolution logic.

Q: How often should you test DR?

AWS recommends quarterly DR tests (Game Days) minimum. Critical systems should test monthly, and document all test results.

Q: What's the practical difference between Pilot Light and Warm Standby?

  • Pilot Light: Only database running, app servers stopped
  • Warm Standby: Full stack running but scaled down

Warm Standby provides faster recovery but costs more.

Q: Aurora Global Database vs DynamoDB Global Tables?

  • Aurora Global: Relational data, SQL needed, ACID transactions
  • DynamoDB Global: NoSQL, millisecond replication, Active-Active writes

References