S3 Select & Glacier Select: Query Only the Data You Need Without Full Download

Key Takeaway

S3 Select extracts only needed data from S3 objects using SQL, reducing data transfer and costs by up to 80% without downloading entire objects. Glacier Select provides the same functionality for archived data.

Exam Tip

Exam Essential: "Only needed data without full object download = S3 Select", "S3 data analysis = Athena (complex queries), S3 Select (simple filtering)"

What is S3 Select?

A server-side filtering feature that extracts only needed rows and columns using SQL expressions from objects stored in S3.

Traditional Approach vs S3 Select

Traditional Approach (Full Download):
[S3: 10GB CSV] ──full download──→ [Application]
                                      │
                                      ▼
                                 Filter needed data
                                 (only needed 1GB)
→ 10GB transfer cost + processing time

S3 Select (Server-Side Filtering):
[S3: 10GB CSV] ──SQL query──→ [S3 Select Processing]
                                    │
                                    ▼
                              [Only 1GB result transferred]
→ 1GB transfer cost + fast processing
→ ~80% cost reduction

Supported Data Formats

Format	S3 Select	Glacier Select	Compression Support
CSV	Yes	Yes	GZIP, BZIP2
JSON	Yes	Yes	GZIP, BZIP2
Parquet	Yes	No	Column compression (Snappy, etc.)

SQL Query Examples

CSV File Query

-- Extract only Seoul region orders from entire CSV
SELECT s.OrderId, s.ProductName, s.Amount
FROM s3object s
WHERE s.Region = 'Seoul'
AND CAST(s.Amount AS DECIMAL) > 10000

JSON File Query

-- Filter specific conditions from JSON array
SELECT s.name, s.age
FROM s3object[*] s
WHERE s.age > 30

Aggregate Functions

-- Sum, average, count
SELECT COUNT(*) AS total,
       SUM(CAST(s.Amount AS DECIMAL)) AS total_amount,
       AVG(CAST(s.Amount AS DECIMAL)) AS avg_amount
FROM s3object s
WHERE s.Category = 'Electronics'

Glacier Select

Query archived data directly without restoring.

Traditional Approach:
[Glacier] ──full restore (hours)──→ [S3] ──download──→ [Analysis]

Glacier Select:
[Glacier] ──SQL query──→ [Only needed data output to S3]
→ Full restore not required

Glacier Select Retrieval Tiers

Tier	Time	Cost
Expedited	1-5 minutes	High
Standard	3-5 hours	Medium
Bulk	5-12 hours	Low

S3 Select vs Athena

Aspect	S3 Select	Amazon Athena
Query Scope	Single object	Multiple objects/buckets
SQL Features	Basic SELECT, WHERE, aggregates	Full SQL (JOIN, subqueries, etc.)
Data Formats	CSV, JSON, Parquet	CSV, JSON, Parquet, ORC, Avro, etc.
Schema	Not required	Table definition needed (Glue Catalog)
Billing	Scanned + returned data	Scanned data ($5/TB)
Use Cases	Single file filtering	Data lake analysis
Infrastructure	None	Serverless

S3 Data Query Selection:
        │
        ▼
Simple filtering from single object?
        │
       Yes → [S3 Select]
        │
        No
        │
        ▼
Complex SQL across multiple files?
        │
       Yes → [Athena]
        │
        No
        │
        ▼
PB-scale continuous analytics/BI?
        │
       Yes → [Redshift / Redshift Spectrum]

Exam Tip

S3 Select vs Athena: S3 Select for simple filtering of single objects, Athena for complex SQL analysis across multiple objects. On exam, "extract partial data from single large file" = S3 Select.

Cost Structure

S3 Select Costs:
- Data scanned: $0.002/GB
- Data returned: $0.0007/GB

Example: Return 10GB from 100GB CSV
- Scan: 100GB × $0.002 = $0.20
- Return: 10GB × $0.0007 = $0.007
- Total: $0.207

Comparison: Full download
- GET request + 100GB transfer
- Transfer cost alone $9.00 (internet egress)
→ ~97% savings with S3 Select

Use Cases

1. Log Analysis

[S3: access-log-2026-01.csv.gz (5GB)]
    │
    ▼
SELECT * FROM s3object s
WHERE s.status_code = '500'
AND s.timestamp > '2026-01-15'
    │
    ▼
[500 errors only extracted: 50MB]
→ No need to download full 5GB

2. IoT Sensor Data Filtering

[S3: sensor-data.json (2GB)]
    │
    ▼
SELECT s.sensor_id, s.temperature
FROM s3object[*] s
WHERE s.temperature > 40.0
    │
    ▼
[Abnormal temperature only extracted: 10MB]

3. S3 Select in Lambda

[S3 Event] → [Lambda]
                 │
                 ├── Query only needed data with S3 Select
                 ├── Save memory/processing time
                 └── Reduce Lambda costs

Limitations

Limitation	Details
Query Scope	Single object only (multiple objects not supported)
SQL Features	JOIN, subqueries not supported
Max Result	256MB (single record 1MB)
Encryption	SSE-S3, SSE-KMS supported (SSE-C not supported)
Object Size	No max size limit for uncompressed CSV/JSON

SAA-C03 Exam Focus Points

✅ Server-Side Filtering: "Only needed data without full download = S3 Select"
✅ vs Athena: "Single object filtering = S3 Select, multi-object analysis = Athena"
✅ Cost Reduction: "Reduced data transfer → cost savings"
✅ Supported Formats: "CSV, JSON, Parquet"
✅ Glacier Select: "Query archive data without restoring"

Exam Tip

Sample Exam Question: "Extract only rows with specific error codes from a 10GB CSV log file stored in S3? Minimize costs." → Answer: S3 Select with SQL query (cost/time savings vs full download)

S3 Select & Glacier Select: Query Only the Data You Need Without Full Download

Key Takeaway

What is S3 Select?

Traditional Approach vs S3 Select

Supported Data Formats

SQL Query Examples

CSV File Query

JSON File Query

Aggregate Functions

Glacier Select

Glacier Select Retrieval Tiers

S3 Select vs Athena

Cost Structure

Use Cases

1. Log Analysis

2. IoT Sensor Data Filtering

3. S3 Select in Lambda

Limitations

SAA-C03 Exam Focus Points

Frequently Asked Questions (FAQ)

Q: Which SDKs support S3 Select?

Q: Can S3 Select be used on compressed files?

Q: What's the difference between S3 Select and Byte Range Fetch?

Q: Is Glacier Select available for all Glacier classes?

Q: Can S3 Select results be saved to another S3 object?

References