AWS Elastic Disaster Recovery: Complete Guide to RPO Seconds and RTO Minutes
How to achieve fast recovery with AWS DRS. DR strategy comparison, AWS Backup differences, and SAA-C03 exam essentials explained.
Related Exam Domains
- Domain 2: Design Resilient Architectures
Key Takeaway
AWS Elastic Disaster Recovery (AWS DRS) achieves RPO of seconds and RTO of 5-20 minutes through continuous block-level replication. It supports disaster recovery for on-premises, other cloud, and cross-region AWS workloads.
Exam Tip
Exam Essential: "Cost-effective DR + low RPO/RTO" → AWS DRS (Pilot Light), "Scheduled backups + longer recovery acceptable" → AWS Backup, "Near-zero RTO + cost not a concern" → Active-Active
When Should You Use AWS DRS?
Best For
AWS DRS Recommended Scenarios:
├── On-premises → AWS disaster recovery
│ └── Quick failover to AWS during datacenter outage
├── Other cloud → AWS disaster recovery
│ └── Recover Azure/GCP workloads to AWS
├── AWS Region-to-Region DR
│ └── Failover to DR region during primary region outage
├── Low RPO/RTO requirements
│ └── Seconds of data loss, minutes to recover
└── Pilot Light strategy implementation
└── Cost-effective DR infrastructure
Not Ideal For
Cases Where AWS DRS Isn't the Best Fit:
├── Simple backup/restore is sufficient
│ → Use AWS Backup (lower cost)
├── Near-zero RTO is mandatory
│ → Configure Active-Active architecture
├── Database-only protection
│ → Use RDS Multi-AZ, Aurora Global Database
└── Large initial data migration
→ Use AWS DMS, Snow Family
How AWS DRS Works
Architecture
┌─────────────────────────────────────────────────────────────┐
│ AWS Elastic Disaster Recovery │
├─────────────────────────────────────────────────────────────┤
│ │
│ [Source Servers] [AWS Staging Area] │
│ (On-premises/Cloud) (Low-cost resources) │
│ │ │ │
│ │ AWS Replication Agent │ │
│ │ (Continuous block replication) │
│ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ Source Volume│ ────────→ │ EBS Volume │ │
│ │ (Production) │ Real-time │ (Replicated) │ │
│ └──────────────┘ └──────────────┘ │
│ │ │
│ ▼ │
│ [During Disaster] │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Recovery │ │
│ │ Instance │ │
│ │ (EC2 Launch)│ │
│ └──────────────┘ │
│ │ │
│ RPO: Seconds │
│ RTO: 5-20 minutes │
│ │
└─────────────────────────────────────────────────────────────┘
Core Components
| Component | Role |
|---|---|
| AWS Replication Agent | Installed on source servers, detects and transmits block changes |
| Replication Server | t3.small EC2, handles up to 15 disks |
| Staging Area | Stores replicated data on low-cost EBS |
| Recovery Instance | Runs actual workload during disaster |
DR Strategies Compared: Which One Should You Choose?
Four DR Strategies
┌─────────────────────────────────────────────────────────────┐
│ DR Strategy Comparison │
├──────────────┬──────────┬──────────┬──────────┬────────────┤
│ │ Backup & │ Pilot │ Warm │ Active- │
│ │ Restore │ Light │ Standby │ Active │
├──────────────┼──────────┼──────────┼──────────┼────────────┤
│ RPO │ Hours │ Minutes │ Minutes │ Near-zero │
├──────────────┼──────────┼──────────┼──────────┼────────────┤
│ RTO │ Hours- │ Minutes │ <5 min │ <1 min │
│ │ Days │ │ │ │
├──────────────┼──────────┼──────────┼──────────┼────────────┤
│ Cost │ $ │ $$ │ $$$ │ $$$$ │
├──────────────┼──────────┼──────────┼──────────┼────────────┤
│ Complexity │ Low │ Medium │ Med-High │ High │
├──────────────┼──────────┼──────────┼──────────┼────────────┤
│ DR Region │ Idle │ Minimal │ Scaled- │ Full │
│ Infra │ (none) │ Config │ down │ Running │
├──────────────┼──────────┼──────────┼──────────┼────────────┤
│ AWS Services │ Backup │ DRS │ Auto │ Route 53 │
│ │ S3 CRR │ Aurora │ Scaling │ Global │
│ │ │ Global │ │ Accelerator│
└──────────────┴──────────┴──────────┴──────────┴────────────┘
Strategy Selection Flow
DR Strategy Selection:
│
▼
Is RTO requirement within minutes?
│
Yes → Is RPO also required in seconds?
│ │
│ Yes → Are there budget constraints?
│ │ │
│ │ Yes → [Pilot Light + AWS DRS]
│ │ │
│ │ No → [Active-Active]
│ │
│ No → [Warm Standby]
│
No
│
▼
Is cost the primary concern?
│
Yes → [Backup & Restore]
│
No → [Pilot Light]
Exam Tip
Pilot Light vs Warm Standby Key Difference:
- Pilot Light: DR region infrastructure is "off", servers need to be started on failover
- Warm Standby: Scaled-down infrastructure is "always running" in DR region, only needs scale-up
AWS DRS vs AWS Backup
Comparison Table
| Aspect | AWS DRS | AWS Backup |
|---|---|---|
| Replication | Continuous block-level | Scheduled snapshots |
| RPO | Seconds | Hours |
| RTO | 5-20 minutes | Hours to days |
| Cost | $20/month/server + EC2/EBS | Storage cost only |
| Target | Entire server (OS, apps, data) | AWS resources (EBS, RDS, DynamoDB) |
| Use Case | Server-level DR | Data backup/restore |
When to Choose What?
Choose AWS DRS:
├── Need to recover entire servers to AWS
├── RPO of seconds required
├── On-premises/other cloud workload DR
└── Implementing Pilot Light strategy
Choose AWS Backup:
├── Regular backup of AWS resources
├── Long-term retention for compliance
├── Hourly RPO is acceptable
└── Cost optimization is priority
AWS DRS Pricing Structure
Cost Components
| Item | Cost (US East) |
|---|---|
| Per-server replication | $0.028/hour (~$20/month) |
| Replication Server (EC2) | t3.small cost |
| Staging EBS | EBS gp3 cost |
| Recovery Instance | Charged only during failover |
Cost Optimization Tips
Cost Reduction Strategies:
├── Use gp3 for staging EBS (cheaper than gp2)
├── Single replication server handles multiple sources (up to 15 disks)
├── Terminate test instances immediately after DR drills
└── Clean up unnecessary Point-in-Time recovery points
Exam Tip
Exam Point: AWS DRS is cheaper than Warm Standby while achieving similar RPO/RTO. Correct answer for "cost-effective + low RPO/RTO" questions!
SAA-C03 Exam Focus Points
Commonly Tested Scenarios
- ✅ Cost-effective DR: "RPO seconds, RTO minutes + minimize cost" → AWS DRS
- ✅ DR Strategy Selection: "RTO 0 + cost not a concern" → Active-Active
- ✅ On-premises DR: "Datacenter → AWS disaster recovery" → AWS DRS
- ✅ Backup vs DRS: Distinguish "scheduled backups vs continuous replication"
- ✅ Pilot Light Definition: "Minimal infrastructure + scale on failover"
Sample Exam Questions
Exam Tip
Sample Exam Question 1: "A company needs to set up disaster recovery for an on-premises SQL Server to AWS. RPO of 5 minutes and RTO of 30 minutes are required. What is the most cost-effective solution?"
→ Answer: AWS Elastic Disaster Recovery (continuous replication achieves RPO seconds, EC2 launch on recovery achieves RTO minutes)
Exam Tip
Sample Exam Question 2: "Which disaster recovery strategy maintains scaled-down infrastructure always running in the DR region, requiring only scale-up during failover?"
→ Answer: Warm Standby (Pilot Light keeps infrastructure off)
Exam Tip
Sample Exam Question 3: "A mission-critical application requires near-zero RTO. Which DR strategy should be used?"
→ Answer: Multi-Site Active/Active (both regions serve traffic simultaneously)
Frequently Asked Questions
Q: What's the difference between AWS DRS and CloudEndure Disaster Recovery?
AWS DRS is the successor to CloudEndure DR. CloudEndure DR was discontinued in March 2024, and AWS DRS provides the same capabilities with better AWS integration (IAM, CloudWatch, PrivateLink).
Q: Can AWS DRS recover databases?
Yes, it replicates databases installed on servers (Oracle, SQL Server, MySQL) at the block level. However, for managed databases (RDS, Aurora), use their native DR features (Multi-AZ, Global Database).
Q: How many Replication Servers do I need?
By default, one Replication Server handles 15 staging disks. If you have many source servers, replication servers scale automatically. You can manually add more if bandwidth is a bottleneck.
Q: Does DR testing affect production?
No, AWS DRS DR testing is non-disruptive. Test instances are created from staging area data, so you can test without impacting source servers.
Q: What is Point-in-Time Recovery?
AWS DRS stores multiple points in time of replicated data. You can recover to a point before ransomware infection or data corruption. Retention is 60 days by default, up to 365 days.
Related Posts
- DR Strategies Complete Guide: Pilot Light, Warm Standby, Active-Active
- Understanding RPO and RTO
- AWS Backup Service