Azure Synapse +Python+ Azure DataBricks Training

Duration35 Days

Mode of TrainingOnline

LevelAdvanced

What Is Azure Databricks?

Azure Databricks provides the latest versions of Apache Spark and allows you to seamlessly integrate with open source libraries. Spin up clusters and build quickly in a fully managed Apache Spark environment with the global scale and availability of Azure.

Azure Synapse?

Azure Synapse is an enterprise analytics service that accelerates time to insight across data warehouses and big data systems. It brings together the best of SQL technologies used in enterprise data warehousing, Apache Spark technologies for big data, and Azure Data Explorer for log and time series analytics.

What you'll learn

Discover the key concepts covered in this course
Describe the features and concepts of azure databricks clusters
Describe the features of azure databricks
Describe the features and concepts of azure databricks jobs
Describe the concept of autoscaling local storage when configuring clusters

Why Visualpath?

azure databricks training

You may also see course CurriculumAzure Synapse +Python+ Azure DataBricks Course Curriculum

Introduction to Big Data Concepts

Big Data introduction
OLTP vs OLAP
SQL vs NoSQL
Data Warehouses vs Data Lakes
Batch vs Streaming processing’s

Apache Spark Programming Essentials – Python Basics

Python fundamentals (syntax, variables, data types)
Control flow (if, loops)
Functions and modules
Collections (list, tuple, set, dictionary)
File handling basics
Python vs Pandas vs Spark overview

Spark SQL & DataFrame Analytics – SQL Basics

Relational database concepts
SQL data types
SELECT, WHERE, ORDER BY
GROUP BY, HAVING
JOIN types (Inner, Left, Right, Full)
Subqueries & CTEs
Basic indexing concepts

Spark SQL & DataFrame Analytics (Advanced)

DataFrame operations (select, filter, withColumn)
Aggregations & groupBy
Joins in Spark
Window functions
UDFs & performance considerations
Temporary & Global views
Data exploration using Spark SQL

Azure Data Lake & Cloud Storage Foundations

Azure Storage overview
Azure Data Lake Storage Gen1 vs Gen2
Blob Storage vs ADLS
Hierarchical namespace
Access control (ACLs & RBAC)
Storage account configuration
Data organization (Bronze / Silver / Gold)
Accessing ADLS using Databricks & ADF

Data Integration with Azure Data Factory

Understand Azure Data Factory
Describe data integration patterns
Explain the data factory process
Understand Azure Data Factory components
Azure Data Factory security
Set up Azure Data Factory
Create Linked Services
Create Datasets
Create Data Factory activities and pipelines
Manage Integration Runtimes
Data integration with Azure Data Factory
Code-free transformation at scale with Azure Data Factory

Data Transformation with Azure Data Factory

Transform data using Azure Data Factory
Execute code-free transformations at scale
Create pipelines to import poorly formatted CSV files
Create Mapping Data Flows
Data cleansing and standardization
Join, aggregate, derive and conditional transformations
Debugging & monitoring data flows

Azure Synapse & SQL Data Warehousing

Azure Synapse workspace overview
Synapse architecture
Serverless SQL Pool vs Dedicated SQL Pool
Data warehousing concepts
Star & Snowflake schema design
COPY INTO & PolyBase
Distribution, partitioning & performance tuning

Introduction to Azure Databricks & Lakehouse

Databricks workspace architecture
Lakehouse architecture concepts
Databricks vs ADF vs Synapse (use cases)
Databricks components overview
Cost and performance considerations

Databricks Workspace, Clusters & Notebooks

Databricks workspace UI deep dive
Cluster architecture
Cluster types & autoscaling
Notebook types (Python, SQL, Scala)
Job scheduling
Databricks REST API & CLI
Git integration using Databricks Repos

Data Ingestion Techniques for the Lakehouse

Ingesting CSV, JSON, XML, Parquet
Mounting ADLS & Blob storage
Auto Loader (cloudFiles)
Schema inference & evolution
Streaming ingestion basics
Optimizing ingestion for high-volume data

Data Management, Governance & Unity Catalog

DBFS vs External tables
Metastore concepts
Unity Catalog architecture
Catalogs, schemas, tables
Data access permissions
Lineage & auditing
Securing enterprise data access

Advanced Data Processing with Spark

Complex transformations
Handling nulls and corrupt records
Schema evolution strategies
Nested & semi-structured data
Exploding arrays and structs

Databricks Utilities, Widgets & Automation

dbutils (file system, secrets, jobs)
Secret scopes & Key Vault integration
Notebook widgets
Parameterized notebooks
Job orchestration
Operational best practices

Delta Lake Architecture & Operations

Delta Lake fundamentals
ACID transactions
Delta logs & versioning
Schema enforcement & evolution
Time travel & rollback
OPTIMIZE & ZORDER
VACUUM & retention management

LakeFlow & Modern Data Orchestration

LakeFlow overview
Delta Live Tables (DLT)
Auto Loader integration
Pipeline orchestration
Monitoring & data quality expectations
Event-driven architectures

Real-Time Streaming with Structured Streaming

Streaming fundamentals
Structured Streaming architecture
Event Hubs & Kafka integration
Stateful vs stateless processing
Watermarking & late data handling
Streaming Delta tables
Fault tolerance & checkpointing

Power BI Integration

Connecting Power BI to Databricks SQL Warehouse
Import vs DirectQuery
Performance optimization for BI
Using Delta tables for analytics
Dataset refresh strategies
Publishing dashboards

Terraform for Databricks Automation

Infrastructure as Code fundamentals
Terraform basics
Azure & Databricks providers
Automating clusters, jobs, notebooks
State management
CI/CD best practices

Databricks Performance Optimization

Understanding Spark UI
Identifying performance bottlenecks
Avoiding data skew
Shuffle optimization
Caching & broadcast joins
Z-Ordering & file compaction
Cluster sizing & DBR selection
Cost optimization best practices