Data Pipelines on AWS
Data Pipelines on AWS
AWS provides services for each stage of data pipeline: ingestion, storage, cataloging, processing, and analysis. These services integrate together to build complete data workflows.
Data Pipelines on AWS Video
W3schools.com collaborates with Amazon Web Services to deliver digital training content to our learners.
Data Ingestion
Collect data and move it to storage. Use real-time when you need data immediately, batch when a small delay is acceptable.
| Service | What It Does |
|---|---|
| Amazon Kinesis Data Streams | Collects large amounts of data in real-time from apps, devices, and sensors. Serverless and scales automatically. |
| Amazon Data Firehose | Collects data in near real-time (seconds). Fully managed - delivers data to storage automatically. |
Data Storage
Centralize data from multiple sources for analysis.
| Service | What It Does |
|---|---|
| Amazon S3 | Object storage for data lakes. Stores any amount of data (structured or unstructured). Scales automatically. |
| Amazon Redshift | Data warehouse for structured data. Stores petabytes of data with pay-as-you-go pricing. |
Data Cataloging
Organize and find your data across services.
| Service | What It Does |
|---|---|
| AWS Glue Data Catalog | Central place to store information about your data. Makes it easy to find and use data across services. |
Data Processing
Clean and transform data for analysis.
| Service | What It Does |
|---|---|
| AWS Glue | ETL service (Extract, Transform, Load). Prepares data for analysis. Works with the Glue Data Catalog. |
| Amazon EMR | Large-scale data processing. Supports Spark, Hadoop, and Hive. AWS manages the infrastructure. |
Analysis and Visualization
Query data and create visual insights.
| Service | What It Does |
|---|---|
| Amazon Athena | Run SQL queries on data in S3. Serverless - pay only for queries you run. |
| Amazon Redshift | Fast SQL queries on large datasets. Best for frequent, complex analytics. |
| Amazon QuickSight | Create dashboards and reports. Amazon Q lets you ask questions in plain English. |
| Amazon OpenSearch Service | Search your data with keywords or natural language. Real-time dashboards. |