Course Overview
Data Engineering on Microsoft Azure training and lab assignments skills students on the data engineering patterns, working with batch and real-time analytical solutions, understanding the core compute and storage technologies, and serving layers and focus on data engineering Azure Synapse pipelines.
Key Features
Real-industry training
Hands-on experience
Robust lab assignments
Lifetime access to digital materials
Learning Objectives
After completing this course, students will be able to:
- Understand Azure Synapse serverless SQL pools capabilities
- Query data in the lake using Azure Synapse serverless SQL pools
- Create metadata objects in Azure Synapse serverless SQL pools
- Secure data and manage users in Azure Synapse serverless SQL pools
Benefits
You learn about Azure Synapse Analytics, Databricks, Data Lake storage, Delta Lake architecture, Azure Stream Analytics, Analytical workload, Apache Spark notebooks in Azure Synapse Analytics, Transform data with DataFrames, Integrate SQL and Apache Spark pools in Azure Synapse Analytics, etc.
Prerequisites
- Cloud computing
- Core data concepts
- Data solutions
- Azure Fundamentals
Course Curriculum
-
Topic Covered:
- Introduction to Azure Synapse Analytics
- Describe Azure Databricks
- Introduction to Azure Data Lake storage
- Describe Delta Lake architecture
- Work with data streams by using Azure Stream Analytics
Lab:
- Combine streaming and batch processing with a single pipeline
- Organize the data lake into levels of file transformation
- Index data lake storage for query and workload acceleration
-
Topic Covered:
- Design a multidimensional schema to optimize analytical workloads
- Code-free transformation at scale with Azure Data Factory
- Populate slowly changing dimensions in Azure Synapse Analytics pipelines
Lab:
- Design a star schema for analytical workloads
- Populate slowly changing dimensions with Azure Data Factory and mapping data flows
-
Topic Covered:
- Design a Modern Data Warehouse using Azure Synapse Analytics
- Secure a data warehouse in Azure Synapse Analytics
Lab:
- Managing files in an Azure data lake
- Securing files stored in an Azure data lake
-
Topic Covered:
- Explore Azure Synapse serverless SQL pools capabilities
- Query data in the lake using Azure Synapse serverless SQL pools
- Create metadata objects in Azure Synapse serverless SQL pools
- Secure data and manage users in Azure Synapse serverless SQL pools
Lab:
- Query Parquet data with serverless SQL pools
- Create external tables for Parquet and CSV files
- Create views with serverless SQL pools
- Secure access to data in a data lake when using serverless SQL pools
- Configure data lake security using Role-Based Access Control (RBAC) and Access Control List (ACL)
-
Topic Covered:
- Understand big data engineering with Apache Spark in Azure Synapse Analytics
- Ingest data with Apache Spark notebooks in Azure Synapse Analytics
- Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
- Integrate SQL and Apache Spark pools in Azure Synapse Analytics
Lab:
- Perform Data Exploration in Synapse Studio
- Ingest data with Spark notebooks in Azure Synapse Analytics
- Transform data with DataFrames in Spark pools in Azure Synapse Analytics
- Integrate SQL and Spark pools in Azure Synapse Analytics
-
Topic Covered:
- Describe Azure Databricks
- Read and write data in Azure Databricks
- Work with DataFrames in Azure Databricks
- Work with DataFrames advanced methods in Azure Databricks
Lab:
- Use DataFrames in Azure Databricks to explore and filter data
- Cache a DataFrame for faster subsequent queries
- Remove duplicate data
- Manipulate date/time values
- Remove and rename DataFrame columns
- Aggregate data stored in a DataFrame
-
Topic Covered:
- Use data loading best practices in Azure Synapse Analytics
- Petabyte-scale ingestion with Azure Data Factory
Lab:
- Perform petabyte-scale ingestion with Azure Synapse Pipelines
- Import data with PolyBase and COPY using T-SQL
- Use data loading best practices in Azure Synapse Analytics
-
Topic Covered:
- Data integration with Azure Data Factory or Azure Synapse Pipelines
- Code-free transformation at scale with Azure Data Factory or Azure Synapse Pipelines
Lab:
- Execute code-free transformations at scale with Azure Synapse Pipelines
- Create data pipeline to import poorly formatted CSV files
- Create Mapping Data Flows
-
Topic Covered:
- Orchestrate data movement and transformation in Azure Data Factory
Lab:
- Integrate Data from Notebooks with Azure Data Factory or Azure Synapse Pipelines
-
Topic Covered:
- Optimize data warehouse query performance in Azure Synapse Analytics
- Understand data warehouse developer features of Azure Synapse Analytics
Lab:
- Understand developer features of Azure Synapse Analytics
- Optimize data warehouse query performance in Azure Synapse Analytics
- Improve query performance
-
Topic Covered:
- Analyze and optimize data warehouse storage in Azure Synapse Analytics
Lab:
- Check for skewed data and space usage
- Understand column store storage details
- Study the impact of materialized views
- Explore rules for minimally logged operations
-
Topic Covered:
- Design hybrid transactional and analytical processing using Azure Synapse Analytics
- Configure Azure Synapse Link with Azure Cosmos DB
- Query Azure Cosmos DB with Apache Spark pools
- Query Azure Cosmos DB with serverless SQL pools
Lab:
- Configure Azure Synapse Link with Azure Cosmos DB
- Query Azure Cosmos DB with Apache Spark for Synapse Analytics
- Query Azure Cosmos DB with serverless SQL pool for Azure Synapse Analytics
-
Topic Covered:
- Secure a data warehouse in Azure Synapse Analytics
- Configure and manage secrets in Azure Key Vault
- Implement compliance controls for sensitive data
Lab:
- Secure Azure Synapse Analytics supporting infrastructure
- Secure the Azure Synapse Analytics workspace and managed services
- Secure Azure Synapse Analytics workspace data
-
Topic Covered:
- Enable reliable messaging for Big Data applications using Azure Event Hubs
- Work with data streams by using Azure Stream Analytics
- Ingest data streams with Azure Stream Analytics
Lab:
- Use Stream Analytics to process real-time data from Event Hubs
- Use Stream Analytics windowing functions to build aggregates and output to Synapse Analytics
- Scale the Azure Stream Analytics job to increase throughput through partitioning
- Repartition the stream input to optimize parallelization
-
Topic Covered:
- Process streaming data with Azure Databricks structured streaming
Lab:
- Explore key features and uses of Structured Streaming
- Stream data from a file and write it out to a distributed file system
- Use sliding windows to aggregate over chunks of data rather than all data
- Apply watermarking to remove stale data
- Connect to Event Hubs read and write streams
-
Topic Covered:
- Create reports with Power BI using its integration with Azure Synapse Analytics
Lab:
- Integrate an Azure Synapse workspace and Power BI
- Optimize integration with Power BI
- Improve query performance with materialized views and result-set caching
- Visualize data with SQL serverless and create a Power BI report
-
Topic Covered:
- Use the integrated machine learning process in Azure Synapse Analytics
Lab:
- Create an Azure Machine Learning linked service
- Trigger an Auto ML experiment using data from a Spark table
- Enrich data using trained models
- Serve prediction results using Power BI
DOWNLOAD SYLLABUS
lorem