SAABlog
MonitoringIntermediate

AWS X-Ray: Complete Guide to Distributed Tracing for Microservices

How to trace Lambda, API Gateway, and ECS requests with X-Ray. Traces, segments, service maps, and SAA-C03 exam essentials explained.

PHILOLAMB-
X-RayDistributed TracingMicroservicesAPMPerformance

Related Exam Domains

  • Domain 2: Design Resilient Architectures
  • Domain 3: Design High-Performing Architectures

Key Takeaway

AWS X-Ray is a service that traces and visualizes request flows across distributed applications. Use traces to understand the full request path and service maps to identify performance bottlenecks.

Exam Tip

Exam Essential: "Identify microservices bottlenecks" → X-Ray, "Log/metric monitoring" → CloudWatch, "API Gateway + Lambda tracing" → X-Ray Active Tracing


When Should You Use AWS X-Ray?

Best For

X-Ray Recommended Scenarios:
├── Microservices architecture
│   └── Trace request flows across multiple services
├── Serverless applications
│   └── Trace Lambda → API Gateway → DynamoDB calls
├── Performance bottleneck analysis
│   └── Identify which service causes latency
├── Root cause analysis
│   └── Track where errors originate
└── Service dependency mapping
    └── Auto-generated service maps

Not Ideal For

Cases Where X-Ray Isn't the Best Fit:
├── Simple log collection/analysis
│   → Use CloudWatch Logs
├── Infrastructure metric monitoring
│   → Use CloudWatch Metrics
├── Real-time alerts
│   → Use CloudWatch Alarms
└── Cost optimization only
    → AWS Cost Explorer, Trusted Advisor

X-Ray Core Concepts

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   How AWS X-Ray Works                        │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│   [Client]                                                   │
│       │                                                      │
│       ▼ Request                                              │
│   ┌──────────────┐                                          │
│   │ API Gateway  │ ──→ Creates Segment                      │
│   └──────────────┘                                          │
│       │                                                      │
│       ▼                                                      │
│   ┌──────────────┐                                          │
│   │   Lambda     │ ──→ Creates Segment                      │
│   │   Function   │     + Subsegments (DynamoDB calls)       │
│   └──────────────┘                                          │
│       │                                                      │
│       ▼                                                      │
│   ┌──────────────┐                                          │
│   │  DynamoDB    │ ──→ Subsegment                           │
│   └──────────────┘                                          │
│                                                              │
│   All Segments → X-Ray Daemon → X-Ray Service               │
│                         ↓                                    │
│                   [Service Map] + [Trace Analysis]          │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Terms

TermDescription
TraceComplete path of a single request (composed of segments)
SegmentWork done by a single service
SubsegmentDetailed work within a segment (DB calls, HTTP requests)
AnnotationSearchable key-value metadata (for filtering)
MetadataNon-searchable additional information
Service MapVisual representation of service dependencies

Sampling Rules

X-Ray Sampling (Cost Optimization):
├── Default: First request + 5% per second
├── Custom rules available
│   ├── Trace specific URL patterns only
│   ├── Trace specific users only
│   └── Trace error requests only
└── Purpose: Prevent cost explosion from tracing all requests

Exam Tip

Sampling Exam Point: X-Ray doesn't trace all requests by default. Sampling rules are applied to optimize cost and performance.


X-Ray Integration with AWS Services

Supported Services

ServiceIntegrationSetup
LambdaNativeEnable Active Tracing in function config
API GatewayNativeEnable X-Ray in stage settings
ECS/EKSX-Ray daemon sidecarAdd X-Ray container to task
EC2Install X-Ray daemonRun daemon process on instance
Elastic BeanstalkNativeEnable in environment settings

Lambda + API Gateway Tracing

API Gateway Active Tracing Setup:
├── Console: Stages → Logs/Tracing → Enable X-Ray Tracing
├── CLI: aws apigateway update-stage --tracing-enabled true
└── Result: Traces API Gateway → Lambda → downstream

Lambda Active Tracing Setup:
├── Console: Configuration → Monitoring → Enable Active tracing
├── CLI: aws lambda update-function-configuration --tracing-config Mode=Active
└── Result: Traces execution time, cold starts, downstream calls

X-Ray vs CloudWatch: Which One Should You Choose?

Comparison Table

AspectX-RayCloudWatch
Primary PurposeDistributed tracing, request flow analysisLogs, metrics, alarms
Data TypeTraces (request paths)Logs, metrics, events
VisualizationService maps, trace timelinesDashboards, graphs
SearchAnnotation-based filteringLog Insights queries
AlarmsNot directly supportedNative alarm feature
PricingPer traced traceLog volume, metric count

Using Together

Recommended Combination:
├── X-Ray: Request flow tracing, bottleneck analysis
├── CloudWatch Logs: Detailed application logs
├── CloudWatch Metrics: CPU, memory, custom metrics
├── CloudWatch Alarms: Threshold breach notifications
└── CloudWatch ServiceLens: Unified X-Ray + CloudWatch view

Exam Tip

Exam Point: "Where is latency occurring?" → X-Ray, "What happened?" → CloudWatch Logs


Pricing Structure

Pricing (US East)

ItemFree TierAfter
Traces Recorded100,000/month$5.00/million traces
Traces Scanned1,000,000/month$0.50/million scanned
Trace Retention30 daysNo additional cost

Cost Optimization Tips

Cost Reduction Strategies:
├── Optimize sampling rules
│   └── Production: low rate, Development: high rate
├── Trace only necessary services
│   └── Critical paths only, not all microservices
├── Minimize annotations
│   └── Searchable data affects cost
└── Use trace groups
    └── Filter to view only relevant traces

SAA-C03 Exam Focus Points

Commonly Tested Scenarios

  1. Distributed Tracing Tool: "Identify microservices bottlenecks" → X-Ray
  2. Serverless Tracing: "Trace Lambda + API Gateway requests" → X-Ray Active Tracing
  3. Service Dependency: "Visualize service relationships" → X-Ray Service Map
  4. X-Ray vs CloudWatch: Distinguish distributed tracing vs logs/metrics
  5. ECS/EKS Tracing: "Trace containerized apps" → X-Ray daemon sidecar

Sample Exam Questions

Exam Tip

Sample Exam Question 1: "In a microservices architecture, you need to identify why certain API requests are slow. How can you determine which service is causing the latency?"

→ Answer: AWS X-Ray (Service maps and trace timelines identify bottlenecks)

Exam Tip

Sample Exam Question 2: "How can you trace request flows across microservices running on an EKS cluster?"

→ Answer: Deploy X-Ray daemon as a sidecar container + integrate X-Ray SDK in applications

Exam Tip

Sample Exam Question 3: "How can you trace the complete request path of a serverless application using API Gateway and Lambda?"

→ Answer: Enable Active Tracing on both API Gateway and Lambda


Frequently Asked Questions

Q: What's the difference between X-Ray and CloudWatch Application Insights?

X-Ray specializes in distributed tracing to track request flows. Application Insights auto-detects issues in specific workloads like .NET/SQL Server. Recently, CloudWatch Application Signals integrates X-Ray to provide APM capabilities.

Q: Does X-Ray support HTTP API (API Gateway v2)?

No, X-Ray currently only natively supports REST API (API Gateway v1). For HTTP APIs, you need to use the X-Ray SDK directly in Lambda.

Q: What happens if I set sampling rate to 100%?

All requests are traced, causing cost to spike significantly. Not recommended for production. Only use high sampling rates temporarily during debugging.

Q: Where should the X-Ray daemon run?

  • EC2: Install directly on instances
  • ECS: Add as sidecar container in task definition
  • Lambda: Automatically included (just enable setting)
  • Elastic Beanstalk: Included in platform

Q: How long is trace data retained?

X-Ray retains trace data for 30 days at no additional cost. For longer retention, export traces to S3.


References