Amazon Redshift Basics: When to Use a Data Warehouse

Key Takeaway

Amazon Redshift is a data warehouse that stores petabyte-scale data in columnar format and processes complex analytical queries (OLAP) rapidly. Choose RDS for OLTP (transactions), Athena for ad-hoc S3 queries, and Redshift for large-scale analytics/BI.

Exam Tip

Exam Essential: "Data warehouse, OLAP, BI, PB-scale analytics" → Redshift, "OLTP, transactions" → RDS, "S3 serverless queries" → Athena

When Should You Use Redshift?

Good Fit

Business Intelligence (BI): Complex aggregation, join queries
Data Warehousing: Unified analysis of data from multiple sources
Petabyte-Scale Analytics: Analytical queries on massive datasets
Regular Reporting: Daily/weekly/monthly dashboards
Post-ETL Analysis: Load data from S3 → Redshift for analysis

Not a Good Fit

OLTP Workloads: Real-time transactions → RDS/Aurora
Ad-hoc S3 Queries: Simple S3 analysis → Athena
NoSQL: Key-value/document data → DynamoDB
Real-time Streaming Analytics: Millisecond latency → Kinesis Data Analytics

Redshift Core Features

Columnar Storage

Row-based (RDS):
ID | Name  | Age | City
1  | Kim   | 30  | Seoul
2  | Lee   | 25  | Busan
3  | Park  | 35  | Seoul
→ All columns of a row stored together
→ Good for specific row lookups (OLTP)

Column-based (Redshift):
ID:   1, 2, 3
Name: Kim, Lee, Park
Age:  30, 25, 35
City: Seoul, Busan, Seoul
→ Values of same column stored together
→ Good for aggregation queries (SUM, AVG, COUNT)
→ High compression ratio

Exam Tip

Columnar = Optimal for Analytics: For queries like SELECT AVG(age) FROM users, only the age column needs to be read, making it much faster than row-based.

MPP (Massively Parallel Processing)

┌──────────────────────────────────────┐
│          Leader Node                  │
│  (Query parsing, planning, results)   │
└──────────┬───────────────────────────┘
           │
     ┌─────┼─────┬─────────┐
     ▼     ▼     ▼         ▼
  ┌─────┐┌─────┐┌─────┐┌─────┐
  │Comp ││Comp ││Comp ││Comp │
  │Node1││Node2││Node3││Node4│
  │     ││     ││     ││     │
  │Slice││Slice││Slice││Slice│
  └─────┘└─────┘└─────┘└─────┘
  → Queries split and executed in parallel

Node Types

Type	Characteristics	Use Case
RA3	Storage/compute separation, managed storage	Most workloads (recommended)
DC2	SSD-based, fast local storage	Under 500GB
DS2	HDD-based, large capacity	Legacy (migrate to RA3)

Redshift Serverless

Automatically provisions capacity without cluster management.

Aspect	Redshift Provisioned	Redshift Serverless
Infrastructure	Manually set node count/type	Automatic
Billing	Node-hour based	RPU (Redshift Processing Unit) usage
Scaling	Manual resize / Elastic Resize	Automatic
Best For	Predictable workloads	Intermittent/variable workloads

Redshift Spectrum

Query S3 data directly from Redshift (no data loading required).

[Redshift Cluster]
      │
      ├── Internal Tables (Redshift storage)
      │
      └── External Tables (S3 data) ← Spectrum
           │
           ▼
      [S3: Logs, historical data]
      → Query S3 data without loading into Redshift

Exam Tip

Redshift Spectrum vs Athena: Both query S3 data, but Spectrum requires a Redshift cluster and can join with internal tables. Athena is fully serverless.

Redshift vs RDS vs Athena

Aspect	RDS/Aurora	Redshift	Athena
Type	OLTP (Transactions)	OLAP (Analytics)	Serverless queries
Storage	Row-based	Column-based	S3 (no internal storage)
Data Scale	GB ~ TB	TB ~ PB	Depends on S3
Query Types	INSERT/UPDATE/SELECT	Complex aggregations/joins	Ad-hoc queries
Infrastructure	Managed instances	Cluster / Serverless	Fully serverless
Billing	Instance hours	Node hours / RPU	Data scanned ($5/TB)
Use Cases	App backend, CRUD	BI, dashboards, reports	S3 log analysis

Redshift Feature Summary

Feature	Description
Enhanced VPC Routing	Force all COPY/UNLOAD traffic through VPC
Snapshots	Auto/manual backups, cross-region copy
Federated Query	Directly query RDS/Aurora data
Data Sharing	Share data between clusters (no copying)
Concurrency Scaling	Auto-scale on query spikes
Encryption	Encrypt data at rest with KMS/HSM

SAA-C03 Exam Focus Points

✅ OLAP vs OLTP: "Analytics queries = Redshift, Transactions = RDS"
✅ Columnar: "Columnar storage optimized for aggregation queries"
✅ Spectrum: "Query S3 data directly from Redshift"
✅ Athena Difference: "Athena = serverless ad-hoc, Redshift = ongoing analytics"
✅ Enhanced VPC Routing: "Force data transfer within VPC"

Exam Tip

Sample Exam Question: "Need to consolidate data from multiple sources and generate PB-scale BI reports. Which service is appropriate?" → Answer: Amazon Redshift (OLAP, PB-scale, BI analytics)

Amazon Redshift Basics: When to Use a Data Warehouse

Key Takeaway

When Should You Use Redshift?

Good Fit

Not a Good Fit

Redshift Core Features

Columnar Storage

MPP (Massively Parallel Processing)

Node Types

Redshift Serverless

Redshift Spectrum

Redshift vs RDS vs Athena

Redshift Feature Summary

SAA-C03 Exam Focus Points

Frequently Asked Questions (FAQ)

Q: When should I choose Redshift vs Athena?

Q: Should I choose Redshift Serverless or Provisioned?

Q: Can Redshift process real-time data?

Q: How do I load data into Redshift?

Q: How can I reduce Redshift costs?

References