RPO and RTO Explained: Essential Disaster Recovery Metrics
RPO measures acceptable data loss; RTO measures acceptable downtime. Master these essential SAA-C03 concepts to select the right DR strategy for any scenario.
Related Exam Domains
- Domain 2: Design Resilient Architectures
Key Takeaway
RPO (Recovery Point Objective) is "how much data can we afford to lose" and RTO (Recovery Time Objective) is "how quickly must we recover." RPO determines backup frequency; RTO determines DR strategy complexity.
Exam Tip
Exam Essential: RPO = backward direction (data loss), RTO = forward direction (time to recovery)
RPO vs RTO at a Glance
| Metric | Meaning | Question | Direction | Impact |
|---|---|---|---|---|
| RPO | Recovery Point Objective | How much data can we lose? | Past ← | Backup frequency |
| RTO | Recovery Time Objective | How fast must we recover? | Future → | DR strategy |
RPO and RTO Timeline:
← RPO → ← RTO →
│ │
Last Backup │ Disaster │ Recovery
│ │ │ │ │
────●─────────┼────────●───────┼────────●────→ Time
│ │ │ │ │
10:00 11:00 12:00 13:00 14:00
RPO = 2 hours (last backup → disaster)
→ Data loss: 10:00-12:00
RTO = 2 hours (disaster → recovery)
→ Downtime: 12:00-14:00
RPO (Recovery Point Objective)
What is RPO?
RPO is the maximum acceptable data loss measured in time. It represents the gap between the last backup and the disaster.
RPO Examples:
RPO = 1 hour:
├── Hourly backups required
├── Maximum 1 hour of data loss acceptable
└── Example: 12:00 backup → 12:45 disaster → 45 min data loss
RPO = 24 hours:
├── Daily backup sufficient
├── Maximum 24 hours of data loss acceptable
└── Example: Yesterday backup → Today disaster → Up to 24h loss
Factors Affecting RPO
| Factor | Low RPO | High RPO |
|---|---|---|
| Backup frequency | Minutes | Days |
| Replication type | Synchronous | Asynchronous |
| Backup method | Continuous, log shipping | Full backup |
| Cost | Higher | Lower |
AWS Services for Low RPO
RPO by AWS Solution:
RPO ~0 (near-zero data loss):
├── DynamoDB Global Tables
├── Aurora Global Database (under 1 second replication)
└── S3 Cross-Region Replication
RPO minutes:
├── RDS Multi-AZ (synchronous)
├── RDS Read Replica (async, some lag)
└── EBS Snapshot automation
RPO hours:
├── AWS Backup scheduled
├── RDS automated backups
└── S3 versioning
Exam Tip
Exam Point: RPO determines backup frequency and replication method. RPO = 1 hour means backup at least hourly.
RTO (Recovery Time Objective)
What is RTO?
RTO is the target time to restore service after a disaster. It defines the maximum acceptable downtime.
RTO Examples:
RTO = 4 hours:
├── Must recover within 4 hours of disaster
├── 4+ hours downtime impacts business
└── Warm Standby or Pilot Light strategy appropriate
RTO = 0 (zero downtime):
├── No downtime acceptable
├── Active-Active or automatic failover required
└── Use Multi-AZ, Global Tables, etc.
Factors Affecting RTO
| Factor | Short RTO | Long RTO |
|---|---|---|
| Infrastructure state | Always running | Provision on disaster |
| Automation level | Automatic failover | Manual recovery |
| DR strategy | Active-Active | Backup & Restore |
| Cost | Higher | Lower |
AWS Services for Low RTO
RTO by AWS Solution:
RTO ~0 (instant recovery):
├── Multi-AZ deployments (automatic failover)
├── Route 53 failover routing
├── Global Accelerator
└── DynamoDB Global Tables
RTO minutes:
├── Aurora automatic failover (~30 sec)
├── RDS Multi-AZ (~60 sec)
├── Warm Standby DR strategy
└── AWS Elastic Disaster Recovery
RTO hours:
├── Pilot Light DR strategy
├── EC2 Auto Recovery
└── CloudFormation automation
RTO 24+ hours:
└── Backup & Restore
Exam Tip
Exam Point: RTO determines the DR strategy. Shorter RTO requires more expensive strategies (Active-Active, Warm Standby).
RPO and RTO Relationship
Cost Trade-off
RPO/RTO vs Cost:
Cost
↑
│ ●
│ ●
│ ●
│ ●
│ ●
│ ●
└─────────────────────────────→ Lower RPO/RTO
24h 4h 1h 15m 1m 0
As RPO/RTO decreases:
├── More frequent backup/replication (RPO)
├── More infrastructure always running (RTO)
└── Cost increases exponentially
Independent Objectives
RPO and RTO are set independently:
| Scenario | RPO | RTO | Strategy |
|---|---|---|---|
| Banking core | ~0 | ~0 | Active-Active |
| E-commerce | 5 min | 1 hour | Warm Standby |
| Back-office | 4 hours | 8 hours | Pilot Light |
| Archive | 24 hours | 72 hours | Backup & Restore |
Workload Examples
Typical RPO/RTO by Workload:
Financial Trading System:
├── RPO: ~0 (no transaction loss)
├── RTO: ~0 (instant recovery)
└── Strategy: Active-Active + sync replication
E-commerce:
├── RPO: 5 min (protect recent orders)
├── RTO: 30 min (minimize revenue loss)
└── Strategy: Warm Standby + async replication
Internal HR System:
├── RPO: 4 hours (daily operations)
├── RTO: 24 hours (next business day)
└── Strategy: Pilot Light + scheduled backup
Log Archive:
├── RPO: 24 hours (analytics purpose)
├── RTO: 72 hours (not urgent)
└── Strategy: Backup & Restore
AWS Service RPO/RTO Capabilities
Database Services
| Service | Configuration | RPO | RTO |
|---|---|---|---|
| RDS Single-AZ | Auto backup | 5 min | Hours |
| RDS Multi-AZ | Sync replication | ~0 | 1-2 min |
| Aurora | 6 copies | ~0 | ~30 sec |
| Aurora Global | Cross-region | 1 sec | 1 min |
| DynamoDB | Global Tables | ~0 | ~0 |
Storage Services
| Service | Feature | RPO | Notes |
|---|---|---|---|
| S3 | Versioning | ~0 | Object-level recovery |
| S3 | CRR | Minutes | Async replication |
| EBS | Snapshots | Hours | Depends on snapshot frequency |
| EFS | Backup | Daily | AWS Backup integration |
SAA-C03 Exam Focus Points
- ✅ RPO definition: Maximum acceptable data loss (in time)
- ✅ RTO definition: Target time to recovery completion
- ✅ Direction: RPO = past (backup point), RTO = future (recovery completion)
- ✅ Impact: RPO → backup frequency, RTO → DR strategy
- ✅ Cost: Lower RPO/RTO = higher cost
- ✅ AWS services: Multi-AZ (low RTO), sync replication (low RPO)
Exam Tip
Sample Exam Question: "A company requires RPO of 5 minutes and RTO of 1 hour for their RDS database. What is the most appropriate configuration?" → Answer: Multi-AZ deployment + automated backups (Multi-AZ: RTO 1-2 min, automated backups: RPO 5 min)
Frequently Asked Questions
Q: Which is more important, RPO or RTO?
Depends on workload characteristics. Financial systems may prioritize RPO (no data loss), while e-commerce may prioritize RTO (fast recovery to protect revenue). Typically, both are set according to business requirements.
Q: How do you achieve RPO of zero?
Synchronous replication is required. RDS Multi-AZ (sync), Aurora (6 copies), DynamoDB Global Tables provide near-zero data loss. Note that sync replication may increase latency.
Q: What's the difference between RTO and MTTR?
RTO is a target; MTTR is an actual measurement. RTO (Recovery Time Objective) is the "goal," and MTTR (Mean Time To Recovery) is the "average actual recovery time." MTTR should be shorter than RTO to meet the objective.
Q: How do you test RPO/RTO?
Conduct regular DR drills. Simulate actual failures and measure recovery time (RTO) and data loss (RPO). Test at least annually; critical systems should test more frequently.
Q: Should all systems have the same RPO/RTO?
No, set based on system criticality. Tier 1 (critical) gets low RPO/RTO; Tier 3 (non-critical) gets higher values. This is called "tiered DR" and optimizes costs.