SAABlog
Cost ManagementIntermediate

EC2 Spot Instance Strategy: Save Up to 90% on Costs

Learn how to save up to 90% on EC2 costs with Spot Instances while designing interruption-safe architectures.

PHILOLAMB-Updated: January 31, 2026
Spot InstanceEC2Cost OptimizationAuto ScalingInterruption Handling

Related Exam Domains

  • Domain 4: Design Cost-Optimized Architectures

Key Takeaway

Spot Instances are up to 90% cheaper than On-Demand but can be interrupted with 2-minute notice. Use for fault-tolerant workloads, and ensure stability with instance type diversification + capacity-optimized allocation + Auto Scaling capacity rebalancing.

Exam Tip

Exam Essential: "Spot = up to 90% savings, 2-minute interruption notice, suitable for fault-tolerant workloads"

What is a Spot Instance?

Instances that use AWS's spare EC2 capacity at discounted prices.

AspectOn-DemandSpot Instance
PriceFull price60-90% discount
AvailabilityAlways availableVaries with spare capacity
InterruptionNone2-minute notice before interruption
CommitmentNoneNone

How is Spot pricing determined?

Spot Price = Determined by AWS supply/demand
- Low demand → Price drops (up to 90% off On-Demand)
- High demand → Price rises (discount decreases)

Interruption conditions:
1. Spot price > your max price
2. AWS capacity shortage

Exam Tip

Spot prices change gradually. Uses a supply/demand-based pricing model, not the old auction system.

What Workloads Are Suitable?

Workloads Suitable for Spot

  1. Batch processing: Big data analytics, ETL jobs
  2. CI/CD builds: Jenkins, GitHub Actions build agents
  3. Containers: ECS, EKS worker nodes
  4. Big data: EMR, Spark clusters
  5. ML training: SageMaker training jobs
  6. Web servers: Stateless web servers
  7. HPC: Scientific simulations, rendering

Workloads Not Suitable for Spot

  1. Databases: RDS, self-managed DB servers
  2. Single instances: Servers that become single points of failure
  3. Stateful applications: Apps that must preserve local state
  4. SLA-critical: Services with significant business impact if interrupted

Spot Allocation Strategies

You can choose how Spot Instances are allocated in Auto Scaling groups.

Allocates instances from pools with most available capacity
→ Minimizes interruption probability
StrategyInterruptions (Skyscanner test)
Lowest Price200-300
Capacity-Optimized10-15

Exam Tip

Exam Point: Capacity-optimized strategy is the recommended strategy to minimize Spot interruptions.

Lowest Price

  • Allocates from cheapest pools
  • Higher interruption probability
  • Only suitable for highly fault-tolerant workloads

Price-Capacity Optimized

  • Considers both capacity and price
  • Balance between cost and stability

Instance Type Diversification

The most important strategy for Spot stability.

Bad example: Using only m5.large
→ Immediate interruption when that pool is depleted

Good example: Multiple type + size combinations
→ m5.large, m5.xlarge, m4.large, m5a.large, m5d.large
→ If one pool is depleted, can get from another

Mixed Instances Policy

Mix On-Demand and Spot instances in Auto Scaling groups.

Mixed Instances Policy Example:
┌────────────────────────────────────────┐
│        Auto Scaling Group              │
│                                        │
│  On-Demand: 20% (guaranteed baseline)  │
│  ┌──────┐ ┌──────┐                    │
│  │ m5.lg│ │ m5.lg│                    │
│  └──────┘ └──────┘                    │
│                                        │
│  Spot: 80% (cost savings)              │
│  ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│  │m5.lg │ │m5a.lg│ │m4.lg │ │m5d.lg│ │
│  └──────┘ └──────┘ └──────┘ └──────┘ │
│                                        │
│  Instance types: 4+ (diversified)      │
└────────────────────────────────────────┘

Exam Tip

Best Practice: On-Demand baseline (20-30%) + Spot (70-80%) combination achieves both availability and cost savings

Interruption Handling

2-Minute Notice

AWS provides 2-minute notice before Spot instance interruption.

Ways to receive interruption notice:
1. Instance Metadata Service (IMDS)
   → http://169.254.169.254/latest/meta-data/spot/instance-action

2. EventBridge event
   → EC2 Spot Instance Interruption Warning

3. CloudWatch Events

Graceful Shutdown Implementation

# Polling for interruption notice (30-second intervals)
import requests
import time

def check_spot_interruption():
    try:
        response = requests.get(
            "http://169.254.169.254/latest/meta-data/spot/instance-action",
            timeout=1
        )
        if response.status_code == 200:
            # Interruption pending → start cleanup
            graceful_shutdown()
    except:
        pass  # No interruption notice

def graceful_shutdown():
    # 1. Stop accepting new requests
    # 2. Complete in-progress work or save checkpoint
    # 3. Deregister from ELB target group
    # 4. Flush logs
    pass

Capacity Rebalancing

Auto Scaling group's capacity rebalancing feature starts replacement instances before the interruption notice.

Timeline:
[Rebalancing Signal] ──── [Start Replacement] ──── [2-min Notice] ── [Interrupt]
        │                                                │
        └── Proactive response                           └── Traditional approach

AWS Services with Spot Integration

ServiceSpot Integration
EC2 Auto ScalingMixed Instances Policy
ECSCapacity Provider
EKSManaged Node Group
EMRSpot for Task/Core nodes
SageMakerManaged Spot Training (up to 90% savings)
AWS BatchSpot compute environments

Cost Calculation Example

Running 10 m5.large instances monthly (us-east-1):

On-Demand:
$0.096/hour × 730 hours × 10 instances = $700.80/month

Spot (70% discount):
$0.029/hour × 730 hours × 10 instances = $211.70/month

Savings: $489.10/month (70% reduction)

Mixed (20% OD + 80% Spot):
OD: $0.096 × 730 × 2 = $140.16
Spot: $0.029 × 730 × 8 = $169.36
Total: $309.52/month (56% reduction)

SAA-C03 Exam Focus Points

  1. Cost savings: "Spot is up to 90% cheaper than On-Demand"
  2. Interruption notice: "2-minute notice via instance metadata or EventBridge"
  3. Suitable workloads: "Fault-tolerant, flexible batch/CI/CD/big data"
  4. Allocation strategy: "Capacity-optimized recommended"
  5. Mixed Instances: "On-Demand + Spot mix for availability"

Exam Tip

Sample Exam Question: "What's the most cost-effective way to run batch processing workloads? The jobs can be restarted after interruption." → Answer: Spot Instance (fault-tolerant + maximum cost savings)

Frequently Asked Questions

Q: Is data lost when a Spot Instance is interrupted?

EBS volume data is preserved when instances are stopped/hibernated but deleted when terminated. Instance Store is always lost. Store important data in S3 or EFS.

Q: What happens if a Spot request isn't fulfilled?

The request remains in pending state. When capacity becomes available, instances start automatically. Diversifying instance types increases fulfillment probability.

Q: Can I run databases on Spot Instances?

Technically possible, but not recommended due to data loss and service interruption risks. Use On-Demand or Reserved for databases.

Q: Is Spot Block (defined duration) still available?

No. AWS discontinued Spot Block as of December 24, 2021.

Q: Can I use Spot with Reserved Instances together?

Yes. A common pattern is Reserved Instances for baseline capacity and Spot for additional scaling capacity.

References