S3 Select & Glacier Select: Query Only the Data You Need Without Full Download
Learn how to use S3 Select and Glacier Select to extract only needed data using SQL without downloading entire objects, reducing costs and time.
Related Exam Domains
- Domain 3: Design High-Performing Architectures
Key Takeaway
S3 Select extracts only needed data from S3 objects using SQL, reducing data transfer and costs by up to 80% without downloading entire objects. Glacier Select provides the same functionality for archived data.
Exam Tip
Exam Essential: "Only needed data without full object download = S3 Select", "S3 data analysis = Athena (complex queries), S3 Select (simple filtering)"
What is S3 Select?
A server-side filtering feature that extracts only needed rows and columns using SQL expressions from objects stored in S3.
Traditional Approach vs S3 Select
Traditional Approach (Full Download):
[S3: 10GB CSV] ──full download──→ [Application]
│
▼
Filter needed data
(only needed 1GB)
→ 10GB transfer cost + processing time
S3 Select (Server-Side Filtering):
[S3: 10GB CSV] ──SQL query──→ [S3 Select Processing]
│
▼
[Only 1GB result transferred]
→ 1GB transfer cost + fast processing
→ ~80% cost reduction
Supported Data Formats
| Format | S3 Select | Glacier Select | Compression Support |
|---|---|---|---|
| CSV | Yes | Yes | GZIP, BZIP2 |
| JSON | Yes | Yes | GZIP, BZIP2 |
| Parquet | Yes | No | Column compression (Snappy, etc.) |
SQL Query Examples
CSV File Query
-- Extract only Seoul region orders from entire CSV
SELECT s.OrderId, s.ProductName, s.Amount
FROM s3object s
WHERE s.Region = 'Seoul'
AND CAST(s.Amount AS DECIMAL) > 10000
JSON File Query
-- Filter specific conditions from JSON array
SELECT s.name, s.age
FROM s3object[*] s
WHERE s.age > 30
Aggregate Functions
-- Sum, average, count
SELECT COUNT(*) AS total,
SUM(CAST(s.Amount AS DECIMAL)) AS total_amount,
AVG(CAST(s.Amount AS DECIMAL)) AS avg_amount
FROM s3object s
WHERE s.Category = 'Electronics'
Glacier Select
Query archived data directly without restoring.
Traditional Approach:
[Glacier] ──full restore (hours)──→ [S3] ──download──→ [Analysis]
Glacier Select:
[Glacier] ──SQL query──→ [Only needed data output to S3]
→ Full restore not required
Glacier Select Retrieval Tiers
| Tier | Time | Cost |
|---|---|---|
| Expedited | 1-5 minutes | High |
| Standard | 3-5 hours | Medium |
| Bulk | 5-12 hours | Low |
S3 Select vs Athena
| Aspect | S3 Select | Amazon Athena |
|---|---|---|
| Query Scope | Single object | Multiple objects/buckets |
| SQL Features | Basic SELECT, WHERE, aggregates | Full SQL (JOIN, subqueries, etc.) |
| Data Formats | CSV, JSON, Parquet | CSV, JSON, Parquet, ORC, Avro, etc. |
| Schema | Not required | Table definition needed (Glue Catalog) |
| Billing | Scanned + returned data | Scanned data ($5/TB) |
| Use Cases | Single file filtering | Data lake analysis |
| Infrastructure | None | Serverless |
S3 Data Query Selection:
│
▼
Simple filtering from single object?
│
Yes → [S3 Select]
│
No
│
▼
Complex SQL across multiple files?
│
Yes → [Athena]
│
No
│
▼
PB-scale continuous analytics/BI?
│
Yes → [Redshift / Redshift Spectrum]
Exam Tip
S3 Select vs Athena: S3 Select for simple filtering of single objects, Athena for complex SQL analysis across multiple objects. On exam, "extract partial data from single large file" = S3 Select.
Cost Structure
S3 Select Costs:
- Data scanned: $0.002/GB
- Data returned: $0.0007/GB
Example: Return 10GB from 100GB CSV
- Scan: 100GB × $0.002 = $0.20
- Return: 10GB × $0.0007 = $0.007
- Total: $0.207
Comparison: Full download
- GET request + 100GB transfer
- Transfer cost alone $9.00 (internet egress)
→ ~97% savings with S3 Select
Use Cases
1. Log Analysis
[S3: access-log-2026-01.csv.gz (5GB)]
│
▼
SELECT * FROM s3object s
WHERE s.status_code = '500'
AND s.timestamp > '2026-01-15'
│
▼
[500 errors only extracted: 50MB]
→ No need to download full 5GB
2. IoT Sensor Data Filtering
[S3: sensor-data.json (2GB)]
│
▼
SELECT s.sensor_id, s.temperature
FROM s3object[*] s
WHERE s.temperature > 40.0
│
▼
[Abnormal temperature only extracted: 10MB]
3. S3 Select in Lambda
[S3 Event] → [Lambda]
│
├── Query only needed data with S3 Select
├── Save memory/processing time
└── Reduce Lambda costs
Limitations
| Limitation | Details |
|---|---|
| Query Scope | Single object only (multiple objects not supported) |
| SQL Features | JOIN, subqueries not supported |
| Max Result | 256MB (single record 1MB) |
| Encryption | SSE-S3, SSE-KMS supported (SSE-C not supported) |
| Object Size | No max size limit for uncompressed CSV/JSON |
SAA-C03 Exam Focus Points
- ✅ Server-Side Filtering: "Only needed data without full download = S3 Select"
- ✅ vs Athena: "Single object filtering = S3 Select, multi-object analysis = Athena"
- ✅ Cost Reduction: "Reduced data transfer → cost savings"
- ✅ Supported Formats: "CSV, JSON, Parquet"
- ✅ Glacier Select: "Query archive data without restoring"
Exam Tip
Sample Exam Question: "Extract only rows with specific error codes from a 10GB CSV log file stored in S3? Minimize costs." → Answer: S3 Select with SQL query (cost/time savings vs full download)
Frequently Asked Questions (FAQ)
Q: Which SDKs support S3 Select?
AWS CLI, Python (Boto3), Java, JavaScript, and other major AWS SDKs support it via the SelectObjectContent API.
Q: Can S3 Select be used on compressed files?
Yes. CSV and JSON files support GZIP and BZIP2 compression. Parquet supports column-level compression (Snappy, GZIP, etc.).
Q: What's the difference between S3 Select and Byte Range Fetch?
Byte Range Fetch retrieves a portion of an object by byte range, while S3 Select filters content using SQL. S3 Select is better for structured data (CSV, JSON), Byte Range Fetch for partial download of binary files.
Q: Is Glacier Select available for all Glacier classes?
Available for Glacier Flexible Retrieval (formerly Glacier). Not available for Glacier Deep Archive.
Q: Can S3 Select results be saved to another S3 object?
S3 Select returns a streaming response. To save results, receive them in your application and re-upload to S3, or use Athena's CTAS (Create Table As Select) feature.