AWS X-Ray: Complete Guide to Distributed Tracing for Microservices
How to trace Lambda, API Gateway, and ECS requests with X-Ray. Traces, segments, service maps, and SAA-C03 exam essentials explained.
Related Exam Domains
- Domain 2: Design Resilient Architectures
- Domain 3: Design High-Performing Architectures
Key Takeaway
AWS X-Ray is a service that traces and visualizes request flows across distributed applications. Use traces to understand the full request path and service maps to identify performance bottlenecks.
Exam Tip
Exam Essential: "Identify microservices bottlenecks" → X-Ray, "Log/metric monitoring" → CloudWatch, "API Gateway + Lambda tracing" → X-Ray Active Tracing
When Should You Use AWS X-Ray?
Best For
X-Ray Recommended Scenarios:
├── Microservices architecture
│ └── Trace request flows across multiple services
├── Serverless applications
│ └── Trace Lambda → API Gateway → DynamoDB calls
├── Performance bottleneck analysis
│ └── Identify which service causes latency
├── Root cause analysis
│ └── Track where errors originate
└── Service dependency mapping
└── Auto-generated service maps
Not Ideal For
Cases Where X-Ray Isn't the Best Fit:
├── Simple log collection/analysis
│ → Use CloudWatch Logs
├── Infrastructure metric monitoring
│ → Use CloudWatch Metrics
├── Real-time alerts
│ → Use CloudWatch Alarms
└── Cost optimization only
→ AWS Cost Explorer, Trusted Advisor
X-Ray Core Concepts
Architecture
┌─────────────────────────────────────────────────────────────┐
│ How AWS X-Ray Works │
├─────────────────────────────────────────────────────────────┤
│ │
│ [Client] │
│ │ │
│ ▼ Request │
│ ┌──────────────┐ │
│ │ API Gateway │ ──→ Creates Segment │
│ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Lambda │ ──→ Creates Segment │
│ │ Function │ + Subsegments (DynamoDB calls) │
│ └──────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ DynamoDB │ ──→ Subsegment │
│ └──────────────┘ │
│ │
│ All Segments → X-Ray Daemon → X-Ray Service │
│ ↓ │
│ [Service Map] + [Trace Analysis] │
│ │
└─────────────────────────────────────────────────────────────┘
Key Terms
| Term | Description |
|---|---|
| Trace | Complete path of a single request (composed of segments) |
| Segment | Work done by a single service |
| Subsegment | Detailed work within a segment (DB calls, HTTP requests) |
| Annotation | Searchable key-value metadata (for filtering) |
| Metadata | Non-searchable additional information |
| Service Map | Visual representation of service dependencies |
Sampling Rules
X-Ray Sampling (Cost Optimization):
├── Default: First request + 5% per second
├── Custom rules available
│ ├── Trace specific URL patterns only
│ ├── Trace specific users only
│ └── Trace error requests only
└── Purpose: Prevent cost explosion from tracing all requests
Exam Tip
Sampling Exam Point: X-Ray doesn't trace all requests by default. Sampling rules are applied to optimize cost and performance.
X-Ray Integration with AWS Services
Supported Services
| Service | Integration | Setup |
|---|---|---|
| Lambda | Native | Enable Active Tracing in function config |
| API Gateway | Native | Enable X-Ray in stage settings |
| ECS/EKS | X-Ray daemon sidecar | Add X-Ray container to task |
| EC2 | Install X-Ray daemon | Run daemon process on instance |
| Elastic Beanstalk | Native | Enable in environment settings |
Lambda + API Gateway Tracing
API Gateway Active Tracing Setup:
├── Console: Stages → Logs/Tracing → Enable X-Ray Tracing
├── CLI: aws apigateway update-stage --tracing-enabled true
└── Result: Traces API Gateway → Lambda → downstream
Lambda Active Tracing Setup:
├── Console: Configuration → Monitoring → Enable Active tracing
├── CLI: aws lambda update-function-configuration --tracing-config Mode=Active
└── Result: Traces execution time, cold starts, downstream calls
X-Ray vs CloudWatch: Which One Should You Choose?
Comparison Table
| Aspect | X-Ray | CloudWatch |
|---|---|---|
| Primary Purpose | Distributed tracing, request flow analysis | Logs, metrics, alarms |
| Data Type | Traces (request paths) | Logs, metrics, events |
| Visualization | Service maps, trace timelines | Dashboards, graphs |
| Search | Annotation-based filtering | Log Insights queries |
| Alarms | Not directly supported | Native alarm feature |
| Pricing | Per traced trace | Log volume, metric count |
Using Together
Recommended Combination:
├── X-Ray: Request flow tracing, bottleneck analysis
├── CloudWatch Logs: Detailed application logs
├── CloudWatch Metrics: CPU, memory, custom metrics
├── CloudWatch Alarms: Threshold breach notifications
└── CloudWatch ServiceLens: Unified X-Ray + CloudWatch view
Exam Tip
Exam Point: "Where is latency occurring?" → X-Ray, "What happened?" → CloudWatch Logs
Pricing Structure
Pricing (US East)
| Item | Free Tier | After |
|---|---|---|
| Traces Recorded | 100,000/month | $5.00/million traces |
| Traces Scanned | 1,000,000/month | $0.50/million scanned |
| Trace Retention | 30 days | No additional cost |
Cost Optimization Tips
Cost Reduction Strategies:
├── Optimize sampling rules
│ └── Production: low rate, Development: high rate
├── Trace only necessary services
│ └── Critical paths only, not all microservices
├── Minimize annotations
│ └── Searchable data affects cost
└── Use trace groups
└── Filter to view only relevant traces
SAA-C03 Exam Focus Points
Commonly Tested Scenarios
- ✅ Distributed Tracing Tool: "Identify microservices bottlenecks" → X-Ray
- ✅ Serverless Tracing: "Trace Lambda + API Gateway requests" → X-Ray Active Tracing
- ✅ Service Dependency: "Visualize service relationships" → X-Ray Service Map
- ✅ X-Ray vs CloudWatch: Distinguish distributed tracing vs logs/metrics
- ✅ ECS/EKS Tracing: "Trace containerized apps" → X-Ray daemon sidecar
Sample Exam Questions
Exam Tip
Sample Exam Question 1: "In a microservices architecture, you need to identify why certain API requests are slow. How can you determine which service is causing the latency?"
→ Answer: AWS X-Ray (Service maps and trace timelines identify bottlenecks)
Exam Tip
Sample Exam Question 2: "How can you trace request flows across microservices running on an EKS cluster?"
→ Answer: Deploy X-Ray daemon as a sidecar container + integrate X-Ray SDK in applications
Exam Tip
Sample Exam Question 3: "How can you trace the complete request path of a serverless application using API Gateway and Lambda?"
→ Answer: Enable Active Tracing on both API Gateway and Lambda
Frequently Asked Questions
Q: What's the difference between X-Ray and CloudWatch Application Insights?
X-Ray specializes in distributed tracing to track request flows. Application Insights auto-detects issues in specific workloads like .NET/SQL Server. Recently, CloudWatch Application Signals integrates X-Ray to provide APM capabilities.
Q: Does X-Ray support HTTP API (API Gateway v2)?
No, X-Ray currently only natively supports REST API (API Gateway v1). For HTTP APIs, you need to use the X-Ray SDK directly in Lambda.
Q: What happens if I set sampling rate to 100%?
All requests are traced, causing cost to spike significantly. Not recommended for production. Only use high sampling rates temporarily during debugging.
Q: Where should the X-Ray daemon run?
- EC2: Install directly on instances
- ECS: Add as sidecar container in task definition
- Lambda: Automatically included (just enable setting)
- Elastic Beanstalk: Included in platform
Q: How long is trace data retained?
X-Ray retains trace data for 30 days at no additional cost. For longer retention, export traces to S3.