SAABlog
MonitoringIntermediate

CloudWatch Complete Guide: Metrics, Alarms, and Log Monitoring

Learn how to monitor AWS infrastructure using Amazon CloudWatch metrics, alarms, logs, and dashboards. Master CloudWatch for SAA-C03 exam.

PHILOLAMB-Updated: January 31, 2026
CloudWatchMonitoringMetricsAlarmsLogs

Related Exam Domains

  • Domain 2: Design Resilient Architectures

Key Takeaway

Amazon CloudWatch is a monitoring service that collects metrics from AWS resources, sets alarms, and centrally manages logs. EC2 basic monitoring is 5-minute intervals, detailed monitoring is 1-minute, and memory/disk metrics require the CloudWatch agent.

Exam Tip

Exam Essential: "Performance monitoring = CloudWatch", "API auditing = CloudTrail", "Configuration tracking = AWS Config". EC2 memory/disk are NOT included in default metrics!

CloudWatch Core Components

[Amazon CloudWatch]
    │
    ├── Metrics
    │     ├── AWS service default metrics
    │     └── Custom metrics
    │
    ├── Alarms
    │     ├── Metric alarms
    │     └── Composite alarms
    │
    ├── Logs
    │     ├── Log groups / Log streams
    │     └── Logs Insights (query)
    │
    ├── Dashboards
    │
    └── Events / EventBridge

Metrics

EC2 Default Metrics

MetricDescriptionIncluded by Default
CPUUtilizationCPU usage percentage
NetworkIn/OutNetwork traffic
DiskReadOps/WriteOpsDisk I/O operations
StatusCheckFailedStatus check failures
MemoryUtilizationMemory usage❌ Agent required
DiskSpaceUtilizationDisk space usage❌ Agent required

Exam Tip

Exam Favorite: EC2 memory utilization and disk space are NOT included in default metrics. You must install the CloudWatch agent and collect them as custom metrics.

Basic vs Detailed Monitoring

ItemBasic MonitoringDetailed Monitoring
Collection Interval5 minutes1 minute
CostFreePaid
ActivationDefaultManual
Use CaseGeneral monitoringFast Auto Scaling response

Custom Metrics

Send custom metrics via CloudWatch agent or API:

aws cloudwatch put-metric-data \
  --namespace "MyApp" \
  --metric-name "ActiveUsers" \
  --value 150 \
  --unit Count

→ Resolution: Standard (60 seconds) or High-resolution (1 second)
→ High-resolution metrics incur additional costs

Key Metrics by Service

ServiceKey Metrics
EC2CPUUtilization, StatusCheck, NetworkIn/Out
RDSCPUUtilization, FreeableMemory, ReadIOPS
ELBRequestCount, HealthyHostCount, Latency
LambdaInvocations, Duration, Errors, Throttles
S3BucketSizeBytes, NumberOfObjects
SQSApproximateNumberOfMessages, ApproximateAgeOfOldestMessage

Alarms

Alarm States

3 States:
┌─────────┐    ┌─────────┐    ┌──────────────────┐
│   OK    │ → │  ALARM  │ → │ INSUFFICIENT_DATA │
│ (Normal)│    │ (Alert) │    │ (No data)        │
└─────────┘    └─────────┘    └──────────────────┘

Alarm Actions

ActionDescription
SNS NotificationEmail, SMS, Lambda trigger
Auto ScalingTrigger scale out/in
EC2 ActionsStop, terminate, reboot, recover instance
Alarm Configuration Example:

Metric: CPUUtilization
Condition: Average > 80% for 5 minutes
Actions:
  ALARM → SNS notification + Auto Scaling scale out
  OK → SNS notification (recovery alert)

Composite Alarms

Combine multiple alarms with AND/OR:

Composite Alarm: "Service Failure"
  = CPU Alarm (ALARM)
    AND Error Rate Alarm (ALARM)
    AND Latency Alarm (ALARM)

→ Notifies only when all 3 are in ALARM state
→ Reduces alarm noise

Logs (CloudWatch Logs)

Structure

CloudWatch Logs Hierarchy:
┌─────────────────────────┐
│ Log Group               │ ← /aws/lambda/my-function
│  ┌───────────────────┐  │
│  │ Log Stream 1      │  │ ← Per instance/container
│  │ Log Stream 2      │  │
│  │ Log Stream 3      │  │
│  └───────────────────┘  │
│  Retention: 1 day ~ forever  │
└─────────────────────────┘

Log Sources

SourceConfiguration Method
EC2Install CloudWatch agent
LambdaAutomatic (IAM permission only)
ECS/Fargateawslogs log driver
API GatewayEnable in stage settings
CloudTrailCloudWatch Logs integration
Route 53DNS query logging
VPC Flow LogsEnable in VPC settings

Logs Insights

CloudWatch Logs Insights Query Examples:

# Query error logs from last hour
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 20

# Lambda function p99 latency
filter @type = "REPORT"
| stats percentile(@duration, 99) as p99
  by bin(1h)

Log Export

CloudWatch Logs → S3: Batch export (NOT real-time ❌)
CloudWatch Logs → Kinesis Data Firehose: Real-time streaming ✅
CloudWatch Logs → Lambda: Real-time processing
CloudWatch Logs → OpenSearch: Real-time analytics

Exam Tip

Real-time Log Export: S3 export is batch (up to 12-hour delay). For real-time, use Kinesis Data Firehose subscription filters.

CloudWatch Agent

When CloudWatch Agent is Needed:
- Collect EC2 memory, disk metrics
- Send EC2 internal log files to CloudWatch Logs
- Monitor on-premises servers

Installation Flow:
[EC2/On-premises] → Install CloudWatch Agent
                → IAM role (EC2) or credentials (on-premises)
                → Metrics + Logs → CloudWatch

CloudWatch vs CloudTrail vs AWS Config

ItemCloudWatchCloudTrailAWS Config
PurposePerformance monitoringAPI audit loggingConfiguration tracking
Question"How's performance?""Who did it?""What changed?"
DataMetrics, logsAPI call recordsResource config history
ExampleCPU > 80% alertTrack who terminated EC2Detect SG rule changes
RetentionMetrics 15 monthsEvents 90 days (S3 unlimited)Config history unlimited

SAA-C03 Exam Focus Points

  1. Memory/Disk Metrics: "EC2 memory monitoring = CloudWatch agent required"
  2. Detailed Monitoring: "1-minute interval = enable detailed monitoring"
  3. Alarm Actions: "Auto scale when CPU high = CloudWatch alarm + Auto Scaling"
  4. Real-time Log Export: "S3 is batch, real-time = Kinesis Data Firehose"
  5. vs CloudTrail: "Performance = CloudWatch, Audit = CloudTrail"

Exam Tip

Sample Exam Question: "You need automatic notification when EC2 instance memory utilization exceeds 90%. What is the solution?" → Answer: Install CloudWatch agent → Memory custom metric → CloudWatch alarm → SNS notification

Frequently Asked Questions

Q: Is CloudWatch free?

Basic monitoring (5-minute intervals), 10 metric alarms, and 10 custom metrics are included in the free tier. Detailed monitoring, additional alarms, dashboards, and Logs Insights queries are charged.

Q: How long is metric data retained?

Depends on resolution. High-resolution (1 second) is retained for 3 hours, 60-second data for 15 days, 5-minute data for 63 days, and 1-hour data for 15 months.

Q: Can I monitor on-premises servers?

Yes. Install the CloudWatch agent on on-premises servers and configure IAM credentials to send metrics and logs to CloudWatch.

Q: Why is my CloudWatch alarm in INSUFFICIENT_DATA state?

This occurs when the alarm was just created, metric data is missing, or the metric namespace is incorrect. Check the missing data treatment settings.

Q: Should I store logs in CloudWatch Logs or S3?

Use CloudWatch Logs for real-time monitoring and Logs Insights queries. Use S3 for long-term retention and cost savings. Typically both are used together.



References