AWS DataSync: How to Migrate On-Premises Data to AWS Quickly
Migrate NFS, SMB storage to S3, EFS, FSx using DataSync. Comparison with Storage Gateway, Transfer Family, and SAA-C03 exam essentials.
Related Exam Domains
- Domain 2: Design Resilient Architectures
- Domain 3: Design High-Performing Architectures
Key Takeaway
AWS DataSync is a data transfer service that automates and accelerates moving data between on-premises storage and AWS storage services. It transfers data up to 10x faster than open-source tools, with built-in encryption and data integrity verification.
Exam Tip
Exam Essential: "On-premises → AWS large data migration" → DataSync, "Hybrid storage continuous access" → Storage Gateway, "SFTP/FTP file transfer" → Transfer Family
When Should You Use DataSync?
Best For
DataSync Recommended Scenarios:
├── Large-scale data migration
│ └── On-premises NFS/SMB → S3, EFS, FSx
├── Scheduled data synchronization
│ └── Hourly/daily/weekly incremental replication
├── Multi-cloud data transfer
│ └── Google Cloud, Azure → AWS migration
├── Cold data archiving
│ └── On-premises → S3 Glacier
└── AWS-to-AWS transfers
└── S3 → EFS, EFS → FSx replication
Not Ideal For
Cases Where DataSync Isn't the Best Fit:
├── Real-time hybrid storage access
│ → Storage Gateway (File/Volume Gateway)
├── External partner SFTP/FTP file exchange
│ → Transfer Family
├── Petabyte-scale offline transfer
│ → Snow Family (Snowball, Snowcone)
└── Simple one-time small data copy
→ AWS CLI (aws s3 cp)
DataSync Architecture
How It Works
┌─────────────────────────────────────────────────────────────┐
│ AWS DataSync Architecture │
├─────────────────────────────────────────────────────────────┤
│ │
│ [On-Premises] [AWS Cloud] │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ NFS/SMB │ │ Amazon │ │
│ │ File Server │ │ S3 │ │
│ └──────┬──────┘ └─────────────┘ │
│ │ ▲ │
│ ▼ │ │
│ ┌─────────────┐ TLS Encrypted ┌────┴────┐ │
│ │ DataSync │ ═══════════════════│ DataSync │ │
│ │ Agent │ Internet/DX │ Service │ │
│ │ (VM) │ │ │ │
│ └─────────────┘ └────┬────┘ │
│ │ │
│ ┌──────┴──────┐ │
│ │ │ │
│ ┌────┴───┐ ┌────┴───┐ │
│ │ EFS │ │ FSx │ │
│ └────────┘ └────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Core Components
| Component | Description |
|---|---|
| DataSync Agent | VM deployed on-premises (VMware, Hyper-V, KVM) |
| Source Location | NFS, SMB, HDFS, object storage, Azure, etc. |
| Destination Location | S3, EFS, FSx (Windows, Lustre, OpenZFS, ONTAP) |
| Task | Transfer configuration and schedule |
| Task Execution | Actual data transfer instance |
Supported Storage
Sources (On-Premises/Other Clouds)
Supported Source Storage:
├── File Systems
│ ├── NFS (Network File System)
│ ├── SMB (Server Message Block)
│ └── HDFS (Hadoop Distributed File System)
├── Object Storage
│ ├── Self-managed object storage
│ └── S3-compatible storage (Wasabi, etc.)
├── Other Clouds
│ ├── Google Cloud Storage
│ ├── Azure Blob Storage
│ └── Azure Files
└── AWS Storage
└── Amazon S3 (cross-region replication)
Destinations (AWS Storage)
| AWS Service | Use Case |
|---|---|
| Amazon S3 | Object storage, data lakes |
| Amazon EFS | Linux file system (NFS) |
| FSx for Windows | Windows file server |
| FSx for Lustre | High-performance computing (HPC) |
| FSx for OpenZFS | Linux high-performance file system |
| FSx for NetApp ONTAP | Enterprise NAS |
| S3 Glacier | Long-term archive |
DataSync vs Storage Gateway vs Transfer Family
Comparison Table
| Aspect | DataSync | Storage Gateway | Transfer Family |
|---|---|---|---|
| Primary Use | Data migration/replication | Hybrid storage access | SFTP/FTP file transfer |
| Data Flow | One-time or scheduled | Continuous bidirectional | File upload/download |
| Local Cache | No | Yes (File Gateway) | No |
| Protocol | DataSync protocol | NFS, SMB, iSCSI | SFTP, FTPS, FTP |
| EFS Support | ✅ Supported | ❌ Not supported | ❌ Not supported |
| Agent | VM agent required | Gateway VM required | Not needed (managed) |
| Billing | Per GB transferred | Gateway hours + storage | Protocol + data transfer |
Selection Guide
Data Transfer Service Selection Flow:
│
▼
Moving data from on-premises to AWS?
│
Yes → Need on-premises access after migration?
│ │
│ Yes → [Storage Gateway]
│ │ (Hybrid storage)
│ │
│ No → Data volume?
│ │
│ ≤ Few TB, sufficient bandwidth
│ │
│ ▼
│ [DataSync]
│
No
│
▼
File exchange with external partners?
│
Yes → [Transfer Family]
│ (SFTP/FTP)
│
No → Choose based on specific requirements
Exam Tip
Exam Point:
- NFS → EFS migration: Storage Gateway doesn't support EFS, use DataSync
- Hybrid access + local cache: Storage Gateway File Gateway
- External partner SFTP: Transfer Family
Key Features
1. Bandwidth Control
Transfer Rate Limiting:
├── Network bandwidth limits (Mbps)
│ └── Lower during business hours, higher at night
├── Up to 10 Gbps utilization
│ └── Single task can saturate network link
└── Direct Connect support
└── Dedicated network for stable transfer
2. Incremental Transfer
Incremental Replication:
├── Transfer only changed data
│ └── Saves time and cost
├── Full data transfer option available
│ └── Used for initial migration
└── File metadata preservation
└── Ownership, permissions, timestamps retained
3. Data Integrity Verification
Integrity Verification Options:
├── During transfer (default)
│ └── Real-time checksum comparison
├── After transfer
│ └── Source-destination comparison on completion
└── Automatic retransmission
└── Auto-retry on mismatch
4. Scheduling
Transfer Scheduling:
├── Manual execution
├── Hourly, daily, weekly schedules
└── Cron expression support
Pricing Structure
Transfer Pricing (US East)
| Mode | Price per GB |
|---|---|
| Basic Mode | $0.0125/GB |
| Enhanced Mode | $0.015/GB |
Cost Example
50 TB Migration (Basic Mode):
├── DataSync transfer cost: 50,000 GB × $0.0125 = $625
├── S3 PUT requests (assuming 100M objects): ~$50
└── Total estimated cost: ~$675
Additional Cost Considerations
- S3 request costs (PUT, GET, LIST)
- CloudWatch logs and metrics
- Direct Connect connection costs
- Cross-region data transfer fees
SAA-C03 Exam Focus Points
Commonly Tested Scenarios
- ✅ NFS → EFS Migration: "On-premises NFS server → Amazon EFS" → DataSync
- ✅ SMB → FSx Migration: "Windows file server → FSx for Windows" → DataSync
- ✅ Bandwidth Control: "Transfer 30 TB over shared 1 Gbps link" → DataSync (rate limiting)
- ✅ Distinguish from Storage Gateway: "Hybrid access needed" → Storage Gateway, "Migration" → DataSync
- ✅ Data Integrity: "Prevent data corruption during transfer" → DataSync (built-in verification)
Sample Exam Questions
Exam Tip
Sample Exam Question 1: "A university research lab needs to migrate 30 TB of data from an on-premises Windows file server to Amazon FSx for Windows File Server. The network bandwidth is shared at 1 Gbps, and the migration must complete within 5 days. What's the most appropriate solution?"
→ Answer: AWS DataSync (bandwidth throttling, SMB → FSx support)
Exam Tip
Sample Exam Question 2: "A company wants to migrate 200 GB of data from an on-premises NFS server to Amazon EFS without interrupting existing services. What's the appropriate solution?"
→ Answer: AWS DataSync (incremental sync, no service interruption, EFS support)
Exam Tip
Sample Exam Question 3: "An on-premises application needs continued access to S3 data after AWS migration. Low latency with local caching is required. What's the appropriate solution?"
→ Answer: Storage Gateway File Gateway (hybrid access, local cache)
Frequently Asked Questions
Q: Where do I install the DataSync Agent?
Deploy it as a VM in your on-premises environment. Supports VMware ESXi, Microsoft Hyper-V, and Linux KVM. No agent needed for AWS-to-AWS transfers.
Q: What's the difference between DataSync and aws s3 sync CLI?
DataSync uses an optimized protocol that's up to 10x faster, with automatic retry, bandwidth control, and scheduling. CLI is suitable for simple copies of small data volumes.
Q: How is data encrypted during transfer?
All data is encrypted in transit with TLS. Integrates with S3, EFS, and FSx encryption at rest.
Q: Is multi-cloud migration possible?
Yes. DataSync supports direct transfer from Google Cloud Storage, Azure Blob Storage, and Azure Files to AWS.
Q: Does DataSync preserve file permissions?
Yes. File metadata including ownership, permissions, and timestamps are preserved. Supports POSIX permissions and Windows ACLs.