EC2 User Data and Metadata: Essential Instance Automation
Master EC2 User Data for bootstrap automation and Instance Metadata for retrieving instance information. Essential concepts for SAA-C03 exam preparation.
Related Exam Domains
- Domain 2: Design Resilient Architectures
- Domain 3: Design High-Performing Architectures
Key Takeaway
User Data is a script that runs automatically when an instance starts, and Metadata is how an instance retrieves information about itself. Both are accessed from within the instance only via the link-local address
169.254.169.254.
Exam Tip
Exam Essential: User Data runs only on first boot by default. Metadata provides IAM role temporary credentials, eliminating the need to hardcode credentials in EC2. IMDSv2 (token-based) is recommended for security.
| Aspect | User Data | Instance Metadata |
|---|---|---|
| Purpose | Run scripts at boot | Query instance information |
| Execution | First boot only (default) | Query anytime |
| Max Size | 16KB | - |
| Access | HTTP from within instance | HTTP from within instance |
| Endpoint | 169.254.169.254/latest/user-data | 169.254.169.254/latest/meta-data/ |
What is User Data?
Concept
User Data is a script that runs automatically when an EC2 instance starts. This is called bootstrapping.
User Data Use Cases:
├── Software installation (web server, agents, etc.)
├── Package updates
├── File downloads (config files from S3)
├── Service start/enable
├── Environment variable configuration
└── CloudWatch Agent setup
Basic User Data Format
#!/bin/bash
# All User Data scripts start with #!
# Update packages
yum update -y
# Install and start web server
yum install -y httpd
systemctl start httpd
systemctl enable httpd
# Create web page
echo "<h1>Hello from EC2</h1>" > /var/www/html/index.html
Key User Data Characteristics
| Characteristic | Description |
|---|---|
| Execution Privileges | Runs as root |
| Execution Time | First boot only (default) |
| Max Size | 16KB (gzip compression available) |
| Encoding | Base64 encoded (console handles automatically) |
| Log Location | /var/log/cloud-init-output.log |
| AMI Inclusion | User Data is NOT included in AMI |
Exam Tip
Exam Point: User Data is NOT included in AMI. Creating an AMI from an instance does not save that instance's User Data to the new AMI.
Writing User Data
Shell Script Method
#!/bin/bash
# Most common approach
# Log output (for debugging)
exec > >(tee /var/log/user-data.log|logger -t user-data -s 2>/dev/console) 2>&1
echo "User Data execution started: $(date)"
# Install packages
yum update -y
yum install -y docker
systemctl start docker
systemctl enable docker
# Run Docker container
docker run -d -p 80:80 nginx
echo "User Data execution completed: $(date)"
Cloud-Init Directive Method
#cloud-config
# YAML format cloud-init configuration
packages:
- httpd
- php
runcmd:
- systemctl start httpd
- systemctl enable httpd
- echo "Hello World" > /var/www/html/index.html
write_files:
- path: /etc/myapp/config.json
content: |
{
"environment": "production",
"debug": false
}
Download Script from S3
#!/bin/bash
# Store large scripts in S3 and download
# AWS CLI is pre-installed on Amazon Linux
aws s3 cp s3://my-bucket/setup-script.sh /tmp/setup.sh
chmod +x /tmp/setup.sh
/tmp/setup.sh
User Data Re-execution
Default Behavior
By default, User Data runs only on first boot. It does not run again on reboot.
To Run on Every Boot
#cloud-config
# cloud-init always run configuration
cloud_final_modules:
- [scripts-user, always]
Or directly in User Data:
#!/bin/bash
# Delete /var/lib/cloud/instance/sem/ file
rm -f /var/lib/cloud/instance/sem/config_scripts_user
# Script contents...
Exam Tip
Exam Trap: "Modifying User Data causes the new script to run after instance restart" → False. By default, it runs only on first boot. Additional configuration is needed for every-boot execution.
What is Instance Metadata?
Concept
Instance Metadata is a service that allows an EC2 instance to query information about itself.
Queryable Metadata:
├── instance-id # Instance ID
├── instance-type # Instance type (t3.micro, etc.)
├── ami-id # AMI ID
├── hostname # Hostname
├── local-ipv4 # Private IP
├── public-ipv4 # Public IP
├── placement/
│ └── availability-zone # Availability Zone
├── security-groups # Security group names
├── iam/
│ └── security-credentials/<role-name> # IAM role credentials
└── network/interfaces/ # Network interface info
Querying Metadata
# Query instance ID
curl http://169.254.169.254/latest/meta-data/instance-id
# Query availability zone
curl http://169.254.169.254/latest/meta-data/placement/availability-zone
# Query Public IP
curl http://169.254.169.254/latest/meta-data/public-ipv4
# Query Private IP
curl http://169.254.169.254/latest/meta-data/local-ipv4
# Query instance type
curl http://169.254.169.254/latest/meta-data/instance-type
# List all metadata categories
curl http://169.254.169.254/latest/meta-data/
IMDSv1 vs IMDSv2
Comparison Table
| Item | IMDSv1 | IMDSv2 |
|---|---|---|
| Authentication | None (anyone can access) | Token-based |
| Security | Vulnerable to SSRF attacks | SSRF protection |
| Request Method | Simple GET request | PUT for token → GET for query |
| AWS Recommendation | Not recommended | Recommended |
Using IMDSv2
# 1. Get token (PUT request, set TTL)
TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" \
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
# 2. Query metadata with token
curl -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/instance-id
Enforcing IMDSv2
# Require IMDSv2 via AWS CLI
aws ec2 modify-instance-metadata-options \
--instance-id i-1234567890abcdef0 \
--http-tokens required \
--http-endpoint enabled
Exam Tip
Exam Essential: IMDSv2 is recommended for security. Setting --http-tokens required disables IMDSv1 and allows only IMDSv2. This defends against SSRF (Server-Side Request Forgery) attacks.
IAM Roles and Metadata
Accessing AWS Services from EC2
EC2 instances need credentials to access AWS services like S3 and DynamoDB.
Credential Methods (in order of preference):
1. ✅ IAM Role (Instance Profile) - Recommended
2. ❌ Access Key in environment variables - Not recommended
3. ❌ Hardcoded Access Key in code - Never do this
Querying IAM Role Credentials
# Query IAM role name
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Query temporary credentials
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/MyEC2Role
# Response example:
{
"Code": "Success",
"AccessKeyId": "ASIA...",
"SecretAccessKey": "xxx...",
"Token": "xxx...",
"Expiration": "2026-01-26T12:00:00Z"
}
AWS SDK Automatic Usage
# Using AWS SDK (boto3) in Python
# If IAM role is attached, credentials are obtained automatically
import boto3
# No need to specify credentials - auto-retrieved from metadata
s3 = boto3.client('s3')
s3.list_buckets()
Exam Tip
Exam Point: AWS SDK automatically retrieves temporary credentials from metadata when using IAM roles. Don't hardcode Access Keys in your code!
Practical User Data Examples
1. Web Server + CloudWatch Agent
#!/bin/bash
# Install web server and configure CloudWatch monitoring
# Basic configuration
REGION=$(curl -s http://169.254.169.254/latest/meta-data/placement/region)
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
# Install web server
yum update -y
yum install -y httpd amazon-cloudwatch-agent
# Create web page
cat > /var/www/html/index.html << EOF
<h1>Hello from $INSTANCE_ID</h1>
<p>Region: $REGION</p>
EOF
# Start web server
systemctl start httpd
systemctl enable httpd
# Download and start CloudWatch Agent config
aws s3 cp s3://my-config-bucket/cloudwatch-config.json /opt/aws/amazon-cloudwatch-agent/etc/
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-c file:/opt/aws/amazon-cloudwatch-agent/etc/cloudwatch-config.json \
-s
2. ECS Container Instance Registration
#!/bin/bash
# Register container instance to ECS cluster
echo ECS_CLUSTER=my-ecs-cluster >> /etc/ecs/ecs.config
echo ECS_ENABLE_CONTAINER_METADATA=true >> /etc/ecs/ecs.config
3. Dynamic Configuration in Auto Scaling
#!/bin/bash
# Apply dynamic configuration in Auto Scaling group
# Query instance information
AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
INSTANCE_TYPE=$(curl -s http://169.254.169.254/latest/meta-data/instance-type)
# Apply AZ-specific configuration
case $AZ in
*a) DB_HOST="db-primary.example.com" ;;
*b) DB_HOST="db-replica-1.example.com" ;;
*c) DB_HOST="db-replica-2.example.com" ;;
esac
# Set as environment variable
echo "export DB_HOST=$DB_HOST" >> /etc/environment
Troubleshooting
Verifying User Data Execution
# Check cloud-init log
cat /var/log/cloud-init-output.log
# Check cloud-init status
cloud-init status
# View User Data contents
curl http://169.254.169.254/latest/user-data
Common Issues
| Problem | Cause | Solution |
|---|---|---|
| Script not running | Missing shebang (#!/bin/bash) | Add shebang on first line |
| Permission error | File/directory permissions | Check chmod, chown |
| Package install fails | No network connection | Verify NAT Gateway/IGW |
| S3 access fails | IAM role not attached | Check Instance Profile |
| Metadata query fails | IMDS disabled | Set --http-endpoint enabled |
SAA-C03 Exam Focus Points
Common Question Types
| Type | Key Point |
|---|---|
| Execution Timing | User Data runs only on first boot (default) |
| IAM Credentials | Temporary credentials auto-retrieved from metadata |
| Security | IMDSv2 recommended, defends against SSRF |
| Size Limit | User Data max 16KB |
| AMI Inclusion | User Data is NOT included in AMI |
| Log Location | /var/log/cloud-init-output.log |
Common Wrong Answer Traps
❌ Modifying User Data runs new script on reboot
→ By default, runs only on first boot
❌ IMDSv1 and IMDSv2 have same security level
→ IMDSv2 is more secure with token-based auth
❌ Metadata can be queried from outside the instance
→ 169.254.169.254 is only accessible from within instance
❌ IAM role credentials are permanent
→ Temporary credentials, automatically rotated
❌ User Data is included in AMI
→ NOT included
FAQ
Q1: What's the difference between User Data and Launch Template?
User Data is a script that runs when an instance starts. Launch Template is a template of instance settings (AMI, instance type, security groups, User Data, etc.). Launch Templates can include User Data.
Q2: Does the instance fail to start if User Data script fails?
No, the instance starts normally even if User Data script fails. Check script failure status in /var/log/cloud-init-output.log.
Q3: Can I completely disable metadata?
Yes, use --http-endpoint disabled to completely disable IMDS. However, this also disables IAM role credentials.
Q4: Why is 169.254.169.254 used?
It's a Link-Local Address. This address is not routed, making it inaccessible from outside the instance. It's a special address AWS uses for the metadata service.
Q5: What if User Data exceeds 16KB?
Compress with gzip or store the script in S3 and download in User Data. Use #include or #cloud-config-archive format for compression.
Summary
User Data and Metadata are essential for EC2 automation:
- User Data: Auto-run scripts at boot, 16KB limit, first boot only
- Metadata: Query instance info, obtain IAM credentials
- IMDSv2: Token-based authentication, recommended for security
- IAM Roles: Use temporary credentials instead of Access Keys