SAABlog
DatabaseIntermediate

Amazon Redshift Basics: When to Use a Data Warehouse

Learn Amazon Redshift's core concepts, compare with RDS and Athena, and understand when a data warehouse is the right choice.

PHILOLAMB-Updated: January 31, 2026
RedshiftData WarehouseOLAPColumnar StorageAnalytics

Related Exam Domains

  • Domain 3: Design High-Performing Architectures

Key Takeaway

Amazon Redshift is a data warehouse that stores petabyte-scale data in columnar format and processes complex analytical queries (OLAP) rapidly. Choose RDS for OLTP (transactions), Athena for ad-hoc S3 queries, and Redshift for large-scale analytics/BI.

Exam Tip

Exam Essential: "Data warehouse, OLAP, BI, PB-scale analytics" → Redshift, "OLTP, transactions" → RDS, "S3 serverless queries" → Athena

When Should You Use Redshift?

Good Fit

  1. Business Intelligence (BI): Complex aggregation, join queries
  2. Data Warehousing: Unified analysis of data from multiple sources
  3. Petabyte-Scale Analytics: Analytical queries on massive datasets
  4. Regular Reporting: Daily/weekly/monthly dashboards
  5. Post-ETL Analysis: Load data from S3 → Redshift for analysis

Not a Good Fit

  1. OLTP Workloads: Real-time transactions → RDS/Aurora
  2. Ad-hoc S3 Queries: Simple S3 analysis → Athena
  3. NoSQL: Key-value/document data → DynamoDB
  4. Real-time Streaming Analytics: Millisecond latency → Kinesis Data Analytics

Redshift Core Features

Columnar Storage

Row-based (RDS):
ID | Name  | Age | City
1  | Kim   | 30  | Seoul
2  | Lee   | 25  | Busan
3  | Park  | 35  | Seoul
→ All columns of a row stored together
→ Good for specific row lookups (OLTP)

Column-based (Redshift):
ID:   1, 2, 3
Name: Kim, Lee, Park
Age:  30, 25, 35
City: Seoul, Busan, Seoul
→ Values of same column stored together
→ Good for aggregation queries (SUM, AVG, COUNT)
→ High compression ratio

Exam Tip

Columnar = Optimal for Analytics: For queries like SELECT AVG(age) FROM users, only the age column needs to be read, making it much faster than row-based.

MPP (Massively Parallel Processing)

┌──────────────────────────────────────┐
│          Leader Node                  │
│  (Query parsing, planning, results)   │
└──────────┬───────────────────────────┘
           │
     ┌─────┼─────┬─────────┐
     ▼     ▼     ▼         ▼
  ┌─────┐┌─────┐┌─────┐┌─────┐
  │Comp ││Comp ││Comp ││Comp │
  │Node1││Node2││Node3││Node4│
  │     ││     ││     ││     │
  │Slice││Slice││Slice││Slice│
  └─────┘└─────┘└─────┘└─────┘
  → Queries split and executed in parallel

Node Types

TypeCharacteristicsUse Case
RA3Storage/compute separation, managed storageMost workloads (recommended)
DC2SSD-based, fast local storageUnder 500GB
DS2HDD-based, large capacityLegacy (migrate to RA3)

Redshift Serverless

Automatically provisions capacity without cluster management.

AspectRedshift ProvisionedRedshift Serverless
InfrastructureManually set node count/typeAutomatic
BillingNode-hour basedRPU (Redshift Processing Unit) usage
ScalingManual resize / Elastic ResizeAutomatic
Best ForPredictable workloadsIntermittent/variable workloads

Redshift Spectrum

Query S3 data directly from Redshift (no data loading required).

[Redshift Cluster]
      │
      ├── Internal Tables (Redshift storage)
      │
      └── External Tables (S3 data) ← Spectrum
           │
           ▼
      [S3: Logs, historical data]
      → Query S3 data without loading into Redshift

Exam Tip

Redshift Spectrum vs Athena: Both query S3 data, but Spectrum requires a Redshift cluster and can join with internal tables. Athena is fully serverless.

Redshift vs RDS vs Athena

AspectRDS/AuroraRedshiftAthena
TypeOLTP (Transactions)OLAP (Analytics)Serverless queries
StorageRow-basedColumn-basedS3 (no internal storage)
Data ScaleGB ~ TBTB ~ PBDepends on S3
Query TypesINSERT/UPDATE/SELECTComplex aggregations/joinsAd-hoc queries
InfrastructureManaged instancesCluster / ServerlessFully serverless
BillingInstance hoursNode hours / RPUData scanned ($5/TB)
Use CasesApp backend, CRUDBI, dashboards, reportsS3 log analysis

Redshift Feature Summary

FeatureDescription
Enhanced VPC RoutingForce all COPY/UNLOAD traffic through VPC
SnapshotsAuto/manual backups, cross-region copy
Federated QueryDirectly query RDS/Aurora data
Data SharingShare data between clusters (no copying)
Concurrency ScalingAuto-scale on query spikes
EncryptionEncrypt data at rest with KMS/HSM

SAA-C03 Exam Focus Points

  1. OLAP vs OLTP: "Analytics queries = Redshift, Transactions = RDS"
  2. Columnar: "Columnar storage optimized for aggregation queries"
  3. Spectrum: "Query S3 data directly from Redshift"
  4. Athena Difference: "Athena = serverless ad-hoc, Redshift = ongoing analytics"
  5. Enhanced VPC Routing: "Force data transfer within VPC"

Exam Tip

Sample Exam Question: "Need to consolidate data from multiple sources and generate PB-scale BI reports. Which service is appropriate?" → Answer: Amazon Redshift (OLAP, PB-scale, BI analytics)

Frequently Asked Questions (FAQ)

Q: When should I choose Redshift vs Athena?

Athena is best for intermittent ad-hoc queries on S3 data. Redshift is best for consolidating data from multiple sources for ongoing complex analysis.

Q: Should I choose Redshift Serverless or Provisioned?

Provisioned is more cost-effective for predictable, continuous workloads. Serverless is better for intermittent, variable workloads.

Q: Can Redshift process real-time data?

Redshift Streaming Ingestion can ingest Kinesis Data Streams data in near real-time. However, for millisecond-level real-time analytics, Kinesis Data Analytics is more suitable.

Q: How do I load data into Redshift?

Use COPY command to load from S3, DynamoDB, or EMR. COPY from S3 is the most common and fastest method. Use INSERT only for small data volumes.

Q: How can I reduce Redshift costs?

Consider Reserved Instances (1-year/3-year commitment) for up to 75% savings, use Concurrency Scaling free credits, delete unnecessary snapshots, and use RA3 nodes to separate storage costs.

References