SAABlog
StorageIntermediate

S3 Select & Glacier Select: Query Only the Data You Need Without Full Download

Learn how to use S3 Select and Glacier Select to extract only needed data using SQL without downloading entire objects, reducing costs and time.

PHILOLAMB-Updated: January 31, 2026
S3 SelectGlacier SelectServer-Side FilteringCost ReductionSQL Query

Related Exam Domains

  • Domain 3: Design High-Performing Architectures

Key Takeaway

S3 Select extracts only needed data from S3 objects using SQL, reducing data transfer and costs by up to 80% without downloading entire objects. Glacier Select provides the same functionality for archived data.

Exam Tip

Exam Essential: "Only needed data without full object download = S3 Select", "S3 data analysis = Athena (complex queries), S3 Select (simple filtering)"

What is S3 Select?

A server-side filtering feature that extracts only needed rows and columns using SQL expressions from objects stored in S3.

Traditional Approach vs S3 Select

Traditional Approach (Full Download):
[S3: 10GB CSV] ──full download──→ [Application]
                                      │
                                      ▼
                                 Filter needed data
                                 (only needed 1GB)
→ 10GB transfer cost + processing time

S3 Select (Server-Side Filtering):
[S3: 10GB CSV] ──SQL query──→ [S3 Select Processing]
                                    │
                                    ▼
                              [Only 1GB result transferred]
→ 1GB transfer cost + fast processing
→ ~80% cost reduction

Supported Data Formats

FormatS3 SelectGlacier SelectCompression Support
CSVYesYesGZIP, BZIP2
JSONYesYesGZIP, BZIP2
ParquetYesNoColumn compression (Snappy, etc.)

SQL Query Examples

CSV File Query

-- Extract only Seoul region orders from entire CSV
SELECT s.OrderId, s.ProductName, s.Amount
FROM s3object s
WHERE s.Region = 'Seoul'
AND CAST(s.Amount AS DECIMAL) > 10000

JSON File Query

-- Filter specific conditions from JSON array
SELECT s.name, s.age
FROM s3object[*] s
WHERE s.age > 30

Aggregate Functions

-- Sum, average, count
SELECT COUNT(*) AS total,
       SUM(CAST(s.Amount AS DECIMAL)) AS total_amount,
       AVG(CAST(s.Amount AS DECIMAL)) AS avg_amount
FROM s3object s
WHERE s.Category = 'Electronics'

Glacier Select

Query archived data directly without restoring.

Traditional Approach:
[Glacier] ──full restore (hours)──→ [S3] ──download──→ [Analysis]

Glacier Select:
[Glacier] ──SQL query──→ [Only needed data output to S3]
→ Full restore not required

Glacier Select Retrieval Tiers

TierTimeCost
Expedited1-5 minutesHigh
Standard3-5 hoursMedium
Bulk5-12 hoursLow

S3 Select vs Athena

AspectS3 SelectAmazon Athena
Query ScopeSingle objectMultiple objects/buckets
SQL FeaturesBasic SELECT, WHERE, aggregatesFull SQL (JOIN, subqueries, etc.)
Data FormatsCSV, JSON, ParquetCSV, JSON, Parquet, ORC, Avro, etc.
SchemaNot requiredTable definition needed (Glue Catalog)
BillingScanned + returned dataScanned data ($5/TB)
Use CasesSingle file filteringData lake analysis
InfrastructureNoneServerless
S3 Data Query Selection:
        │
        ▼
Simple filtering from single object?
        │
       Yes → [S3 Select]
        │
        No
        │
        ▼
Complex SQL across multiple files?
        │
       Yes → [Athena]
        │
        No
        │
        ▼
PB-scale continuous analytics/BI?
        │
       Yes → [Redshift / Redshift Spectrum]

Exam Tip

S3 Select vs Athena: S3 Select for simple filtering of single objects, Athena for complex SQL analysis across multiple objects. On exam, "extract partial data from single large file" = S3 Select.

Cost Structure

S3 Select Costs:
- Data scanned: $0.002/GB
- Data returned: $0.0007/GB

Example: Return 10GB from 100GB CSV
- Scan: 100GB × $0.002 = $0.20
- Return: 10GB × $0.0007 = $0.007
- Total: $0.207

Comparison: Full download
- GET request + 100GB transfer
- Transfer cost alone $9.00 (internet egress)
→ ~97% savings with S3 Select

Use Cases

1. Log Analysis

[S3: access-log-2026-01.csv.gz (5GB)]
    │
    ▼
SELECT * FROM s3object s
WHERE s.status_code = '500'
AND s.timestamp > '2026-01-15'
    │
    ▼
[500 errors only extracted: 50MB]
→ No need to download full 5GB

2. IoT Sensor Data Filtering

[S3: sensor-data.json (2GB)]
    │
    ▼
SELECT s.sensor_id, s.temperature
FROM s3object[*] s
WHERE s.temperature > 40.0
    │
    ▼
[Abnormal temperature only extracted: 10MB]

3. S3 Select in Lambda

[S3 Event] → [Lambda]
                 │
                 ├── Query only needed data with S3 Select
                 ├── Save memory/processing time
                 └── Reduce Lambda costs

Limitations

LimitationDetails
Query ScopeSingle object only (multiple objects not supported)
SQL FeaturesJOIN, subqueries not supported
Max Result256MB (single record 1MB)
EncryptionSSE-S3, SSE-KMS supported (SSE-C not supported)
Object SizeNo max size limit for uncompressed CSV/JSON

SAA-C03 Exam Focus Points

  1. Server-Side Filtering: "Only needed data without full download = S3 Select"
  2. vs Athena: "Single object filtering = S3 Select, multi-object analysis = Athena"
  3. Cost Reduction: "Reduced data transfer → cost savings"
  4. Supported Formats: "CSV, JSON, Parquet"
  5. Glacier Select: "Query archive data without restoring"

Exam Tip

Sample Exam Question: "Extract only rows with specific error codes from a 10GB CSV log file stored in S3? Minimize costs." → Answer: S3 Select with SQL query (cost/time savings vs full download)

Frequently Asked Questions (FAQ)

Q: Which SDKs support S3 Select?

AWS CLI, Python (Boto3), Java, JavaScript, and other major AWS SDKs support it via the SelectObjectContent API.

Q: Can S3 Select be used on compressed files?

Yes. CSV and JSON files support GZIP and BZIP2 compression. Parquet supports column-level compression (Snappy, GZIP, etc.).

Q: What's the difference between S3 Select and Byte Range Fetch?

Byte Range Fetch retrieves a portion of an object by byte range, while S3 Select filters content using SQL. S3 Select is better for structured data (CSV, JSON), Byte Range Fetch for partial download of binary files.

Q: Is Glacier Select available for all Glacier classes?

Available for Glacier Flexible Retrieval (formerly Glacier). Not available for Glacier Deep Archive.

Q: Can S3 Select results be saved to another S3 object?

S3 Select returns a streaming response. To save results, receive them in your application and re-upload to S3, or use Athena's CTAS (Create Table As Select) feature.

References