Training: Implement data engineering solutions using Azure Databricks (DP-750)

Ref. DP-750T00

Download PDF

Duration:

jours

Exam:

Level:

Intermédiaire

Funding:

Eligible

Home > Trainings > IT Pro > Cloud Computing > Implement data engineering solutions using Azure Databricks (DP-750)

Implement data engineering solutions using Azure Databricks Training (DP-750)

Azure Databricks has become the reference platform for large-scale data engineering in Microsoft Azure. The DP-750 training prepares you to design, implement and operate production-grade data pipelines leveraging Apache Spark, Delta Lake and the Lakehouse architecture.

Over four days, you work on Unity Catalog, Delta Live Tables, ETL and ELT pipelines, Workflows orchestration, integration with Azure Data Lake Storage Gen2 and secure data sharing. The training is delivered in Geneva and Lausanne by a Microsoft Certified Trainer.

Participant Profiles

Objectives

Design an Azure Databricks Lakehouse architecture with Unity Catalog for data governance
Develop ETL and ELT pipelines in PySpark and SQL with Delta Lake
Implement streaming and incremental pipelines with Delta Live Tables
Orchestrate complex workflows with Databricks Workflows and integrate with Azure Data Factory
Optimize Spark performance: partitioning, caching, AQE, Photon, autoscaling
Secure and govern data with Unity Catalog, row-level security and data lineage

Prerequisites

Course Content

Module 1 : Explore Azure Databricks

Get started with Azure Databricks
Identify Azure Databricks workloads
Understand key concepts
Data governance using Unity Catalog and Microsoft Purview
Module assessment

Module 2 : Understand Azure Databricks architecture

Understand Azure Databricks architecture
Understand Unity Catalog managed storage
Understand external storage
Understand default storage
Module assessment

Module 3 : Understand Azure Databricks Integrations

Understand integration with Microsoft Fabric
Understand integration with Power BI
Understand integration with VS Code
Understand integration with Power Platform
Understand integration with Copilot Studio
Understand integration with Microsoft Purview
Understand integration with Microsoft Foundry
Module assessment

Module 4 : Select and Configure Compute in Azure Databricks

Choose an appropriate compute type
Configure compute performance
Configure compute features
Install libraries for compute
Configure compute access
Module assessment

Module 5 : Create and organize objects in Unity Catalog

Apply naming conventions
Create catalog
Create schema
Create tables and views
Create volumes
Implement DDL operations
Implement foreign catalog
Configure AI/BI Genie instructions

Module 6 : Secure Unity Catalog objects

Understand query lifecycle
Implement access control strategies
Understand fine-grained access control
Implement row filtering and column masking
Access Azure Key Vault secrets
Authenticate data access with service principals
Authenticate resource access with managed identities
Module assessment

Module 7 : Govern Unity Catalog objects

Create and preserve table definitions
Configure ABAC with tags and policies
Apply data retention policies
Set up and manage data lineage
Configure audit logging
Design secure Delta Sharing strategy
Module assessment

Module 8 : Design and implement data modeling with Azure Databricks

Design ingestion logic and data source configuration
Choose a data ingestion tool
Choose a data table format
Design and implement a data partitioning scheme
Choose a slowly changing dimension (SCD) type
Implement a slowly changing dimension (SCD) type 2
Design and implement a temporal (history) table to record changes over time
Choose granularity on a column or table based on requirements
Choose managed vs unmanaged tables
Design and implement a clustering strategy

Module 9 : Ingest data into Unity Catalog

Ingest data with Lakeflow Connect
Ingest data with notebooks
Ingest data with SQL methods
Ingest data with CDC feed
Ingest data with Spark Structured Streaming
Ingest data with Auto Loader
Ingest data with Lakeflow Spark Declarative Pipelines
Module assessment

Module 10 : Cleanse, transform, and load data into Unity Catalog

Profile data
Choose column data types
Resolve duplicates and nulls
Transform data with filters and aggregations
Transform data with joins and set operators
Transform data with denormalization and pivots
Load data with merge, insert, and append
Module assessment

Module 11 : Implement and manage data quality constraints with Azure Databricks

Implement validation checks
Implement data type checks
Detect and manage schema drift
Manage data quality with pipeline expectations
Module assessment

Module 12 : Design and implement data pipelines with Azure Databricks

Design order of operations for a pipeline
Choose notebook vs Lakeflow Pipelines
Design Lakeflow job logic
Design error handling in pipelines and jobs
Create pipeline with notebook
Create pipeline with Lakeflow Spark Declarative Pipelines
Module assessment

Module 13 : Implement Lakeflow Jobs with Azure Databricks

Create job setup and configuration
Configure job triggers
Schedule a job
Configure job alerts
Configure automatic restarts
Module assessment

Module 14 : Implement development lifecycle processes in Azure Databricks

Apply Git version control best practices
Manage branching and pull requests
Implement testing strategy
Configure and package Declarative Automation Bundles
Deploy bundle with Databricks CLI
Module assessment

Module 15 : Monitor, troubleshoot and optimize workloads in Azure Databricks

Monitor and manage cluster consumption
Troubleshoot and repair Lakeflow Jobs
Troubleshoot Spark jobs and notebooks
Investigate caching, skewing, spilling, shuffle
Implement log streaming with Azure Log Analytics
Module assessment

Documentation

Course material included.

Complementary Courses

Eligible Funding

ITTA is a partner of a continuing education fund dedicated to temporary workers. This fund can subsidize your training, provided that you are subject to the “Service Provision” collective labor agreement (CCT) and meet certain conditions, including having worked at least 88 hours in the past 12 months.

Additional Information

Azure Databricks: the data engineering platform at the heart of the Lakehouse

Azure Databricks unifies data engineering, data science, machine learning and BI on a single platform. The Implement data engineering solutions using Azure Databricks (DP-750) training focuses on the data engineering pillar: ingestion, transformation, quality, governance and data exposure. You work on Apache Spark optimized by Databricks (Photon engine), Delta Lake for ACID reliability, and the Lakehouse architecture that combines the benefits of data lake and data warehouse.

Unity Catalog: unified governance

Unity Catalog is the governance layer you configure during the course: a single metastore catalog for all Databricks workspaces, granular permissions (catalog, schema, table, view, column), automatic data lineage and secure sharing via Delta Sharing. Mastery of Unity Catalog has become essential for enterprise Databricks architectures.

Delta Lake and the medallion architecture

Delta Lake brings ACID transactions, time travel, schema evolution and performance on top of Parquet files in Azure Data Lake Storage. The training covers advanced techniques: MERGE INTO for upserts, OPTIMIZE and Z-ordering for performance, VACUUM for retention, change data feed for change propagation. The medallion architecture (bronze / silver / gold) is presented as a reference pattern.

Delta Live Tables: declarative pipelines

Delta Live Tables (DLT) is a declarative framework to build reliable data pipelines. Instead of orchestrating individual notebooks, you declare transformations and DLT manages dependencies, retries, data quality (expectations) and monitoring. The training shows how to migrate existing pipelines to DLT and combine streaming and batch in the same pipeline.

Spark performance and optimization

Optimizing Spark requires understanding its internals: partitioning, shuffle, broadcast joins, AQE (Adaptive Query Execution), Photon (the native Databricks engine written in C++). You learn to read the Spark UI, identify bottlenecks, adjust cluster configurations and choose the right APIs (DataFrame vs SQL, RDD to avoid).

Audience and prerequisites

The Implement data engineering solutions using Azure Databricks (DP-750) training targets data engineers, ETL engineers and data architects who will design production Databricks pipelines. Prerequisites: Python or Scala knowledge, Azure fundamentals (equivalent to AZ-900), SQL experience. Prior Spark knowledge is a plus but not required.

FAQ Implement data engineering solutions using Azure Databricks (DP-750)

What’s the difference between Azure Databricks and Microsoft Fabric?

Microsoft Fabric integrates a unified SaaS experience (Lakehouse, Data Warehouse, Real-Time Analytics, Power BI). Azure Databricks remains the leading platform for advanced Spark workloads, large-scale ML and multi-cloud architectures. The DP-750 training covers Azure Databricks in depth; DP-600 / DP-700 cover Microsoft Fabric.

Do I need to know Apache Spark before DP-750?

No. The training introduces Spark progressively. However, SQL experience and knowledge of at least one programming language (Python, Scala) are essential.

Does the DP-750 course lead to a Microsoft certification?

DP-750 is a Microsoft Applied Skill, with no formal exam associated. For a certification covering Azure Databricks, see Azure Data Engineer Associate (DP-203) which includes Databricks in its scope.

Does the training cover real-time streaming workloads?

Yes, Structured Streaming and Delta Live Tables in continuous mode are addressed with CDC (Change Data Capture) use cases and Event Hubs / Kafka integration.