EC2 Spot Instance Strategy: Save Up to 90% on Costs
Learn how to save up to 90% on EC2 costs with Spot Instances while designing interruption-safe architectures.
Related Exam Domains
- Domain 4: Design Cost-Optimized Architectures
Key Takeaway
Spot Instances are up to 90% cheaper than On-Demand but can be interrupted with 2-minute notice. Use for fault-tolerant workloads, and ensure stability with instance type diversification + capacity-optimized allocation + Auto Scaling capacity rebalancing.
Exam Tip
Exam Essential: "Spot = up to 90% savings, 2-minute interruption notice, suitable for fault-tolerant workloads"
What is a Spot Instance?
Instances that use AWS's spare EC2 capacity at discounted prices.
| Aspect | On-Demand | Spot Instance |
|---|---|---|
| Price | Full price | 60-90% discount |
| Availability | Always available | Varies with spare capacity |
| Interruption | None | 2-minute notice before interruption |
| Commitment | None | None |
How is Spot pricing determined?
Spot Price = Determined by AWS supply/demand
- Low demand → Price drops (up to 90% off On-Demand)
- High demand → Price rises (discount decreases)
Interruption conditions:
1. Spot price > your max price
2. AWS capacity shortage
Exam Tip
Spot prices change gradually. Uses a supply/demand-based pricing model, not the old auction system.
What Workloads Are Suitable?
Workloads Suitable for Spot
- Batch processing: Big data analytics, ETL jobs
- CI/CD builds: Jenkins, GitHub Actions build agents
- Containers: ECS, EKS worker nodes
- Big data: EMR, Spark clusters
- ML training: SageMaker training jobs
- Web servers: Stateless web servers
- HPC: Scientific simulations, rendering
Workloads Not Suitable for Spot
- Databases: RDS, self-managed DB servers
- Single instances: Servers that become single points of failure
- Stateful applications: Apps that must preserve local state
- SLA-critical: Services with significant business impact if interrupted
Spot Allocation Strategies
You can choose how Spot Instances are allocated in Auto Scaling groups.
Capacity-Optimized - Recommended
Allocates instances from pools with most available capacity
→ Minimizes interruption probability
| Strategy | Interruptions (Skyscanner test) |
|---|---|
| Lowest Price | 200-300 |
| Capacity-Optimized | 10-15 |
Exam Tip
Exam Point: Capacity-optimized strategy is the recommended strategy to minimize Spot interruptions.
Lowest Price
- Allocates from cheapest pools
- Higher interruption probability
- Only suitable for highly fault-tolerant workloads
Price-Capacity Optimized
- Considers both capacity and price
- Balance between cost and stability
Instance Type Diversification
The most important strategy for Spot stability.
Bad example: Using only m5.large
→ Immediate interruption when that pool is depleted
Good example: Multiple type + size combinations
→ m5.large, m5.xlarge, m4.large, m5a.large, m5d.large
→ If one pool is depleted, can get from another
Mixed Instances Policy
Mix On-Demand and Spot instances in Auto Scaling groups.
Mixed Instances Policy Example:
┌────────────────────────────────────────┐
│ Auto Scaling Group │
│ │
│ On-Demand: 20% (guaranteed baseline) │
│ ┌──────┐ ┌──────┐ │
│ │ m5.lg│ │ m5.lg│ │
│ └──────┘ └──────┘ │
│ │
│ Spot: 80% (cost savings) │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │m5.lg │ │m5a.lg│ │m4.lg │ │m5d.lg│ │
│ └──────┘ └──────┘ └──────┘ └──────┘ │
│ │
│ Instance types: 4+ (diversified) │
└────────────────────────────────────────┘
Exam Tip
Best Practice: On-Demand baseline (20-30%) + Spot (70-80%) combination achieves both availability and cost savings
Interruption Handling
2-Minute Notice
AWS provides 2-minute notice before Spot instance interruption.
Ways to receive interruption notice:
1. Instance Metadata Service (IMDS)
→ http://169.254.169.254/latest/meta-data/spot/instance-action
2. EventBridge event
→ EC2 Spot Instance Interruption Warning
3. CloudWatch Events
Graceful Shutdown Implementation
# Polling for interruption notice (30-second intervals)
import requests
import time
def check_spot_interruption():
try:
response = requests.get(
"http://169.254.169.254/latest/meta-data/spot/instance-action",
timeout=1
)
if response.status_code == 200:
# Interruption pending → start cleanup
graceful_shutdown()
except:
pass # No interruption notice
def graceful_shutdown():
# 1. Stop accepting new requests
# 2. Complete in-progress work or save checkpoint
# 3. Deregister from ELB target group
# 4. Flush logs
pass
Capacity Rebalancing
Auto Scaling group's capacity rebalancing feature starts replacement instances before the interruption notice.
Timeline:
[Rebalancing Signal] ──── [Start Replacement] ──── [2-min Notice] ── [Interrupt]
│ │
└── Proactive response └── Traditional approach
AWS Services with Spot Integration
| Service | Spot Integration |
|---|---|
| EC2 Auto Scaling | Mixed Instances Policy |
| ECS | Capacity Provider |
| EKS | Managed Node Group |
| EMR | Spot for Task/Core nodes |
| SageMaker | Managed Spot Training (up to 90% savings) |
| AWS Batch | Spot compute environments |
Cost Calculation Example
Running 10 m5.large instances monthly (us-east-1):
On-Demand:
$0.096/hour × 730 hours × 10 instances = $700.80/month
Spot (70% discount):
$0.029/hour × 730 hours × 10 instances = $211.70/month
Savings: $489.10/month (70% reduction)
Mixed (20% OD + 80% Spot):
OD: $0.096 × 730 × 2 = $140.16
Spot: $0.029 × 730 × 8 = $169.36
Total: $309.52/month (56% reduction)
SAA-C03 Exam Focus Points
- ✅ Cost savings: "Spot is up to 90% cheaper than On-Demand"
- ✅ Interruption notice: "2-minute notice via instance metadata or EventBridge"
- ✅ Suitable workloads: "Fault-tolerant, flexible batch/CI/CD/big data"
- ✅ Allocation strategy: "Capacity-optimized recommended"
- ✅ Mixed Instances: "On-Demand + Spot mix for availability"
Exam Tip
Sample Exam Question: "What's the most cost-effective way to run batch processing workloads? The jobs can be restarted after interruption." → Answer: Spot Instance (fault-tolerant + maximum cost savings)
Frequently Asked Questions
Q: Is data lost when a Spot Instance is interrupted?
EBS volume data is preserved when instances are stopped/hibernated but deleted when terminated. Instance Store is always lost. Store important data in S3 or EFS.
Q: What happens if a Spot request isn't fulfilled?
The request remains in pending state. When capacity becomes available, instances start automatically. Diversifying instance types increases fulfillment probability.
Q: Can I run databases on Spot Instances?
Technically possible, but not recommended due to data loss and service interruption risks. Use On-Demand or Reserved for databases.
Q: Is Spot Block (defined duration) still available?
No. AWS discontinued Spot Block as of December 24, 2021.
Q: Can I use Spot with Reserved Instances together?
Yes. A common pattern is Reserved Instances for baseline capacity and Spot for additional scaling capacity.
Related Posts
- EC2 Pricing Comparison (On-Demand, Reserved, Spot, Savings Plans)
- Auto Scaling Group Setup and Policies
- EC2 Instance Type Selection Guide