Databricks Fundamentals

Databricks Fundamentals

Databricks is an increasingly popular platform for big data processing and analysis. Our Databricks Fundamentals course is a great way to start if you want to improve your skills in this area.

Duration
12 hours
Course type
Online
Language
English
Duration
12 hours
Location
Online
Language
English
Code
EAS-028
Harmonogram i ceny
-
€ 750 *
Szkolenie dla #uczestników# lub większej liczby osób? Dostosuj treningi dla Twoich konkretnych potrzeb
Databricks Fundamentals
Duration
28 hours
Location
Online
Language
English
Code
EAS-028
Harmonogram i ceny
-
€ 750 *
Szkolenie dla #uczestników# lub większej liczby osób? Dostosuj treningi dla Twoich konkretnych potrzeb

Description

Databricks is an increasingly popular platform for big data processing and analysis. Our Databricks Fundamentals course is a great way to start if you want to improve your skills in this area. You will acquire practical experience with important Databricks tools and ideas over the course of several modules, including writing queries in Scala, Python, and SQL, using Delta Lake / Parquet, and working with Notebooks.

One of the primary goals of the course is to make you more comfortable when using Notebook, the web-based interface for data analysis and collaboration for Databricks. With guidance from our trainer, you’ll learn how to efficiently build, manage, and share notebooks, allowing you to deal with complex data challenges.

Another important topic we will cover is the open-source engine, Spark, that powers Databricks data processing capabilities. You will gain a deep understanding of Spark’s internal architecture, as here we can mention RDD (Resilient Distributed Datasets) which according to databricks.com “is an immutable distributed collection of elements of your data, partitioned across nodes in your cluster, that can be operated in parallel with high level API that offers transformations and actions."

In order to make the right decisions on the project and avoid architectural errors, you’ll discover the differences between Delta Lake and Parquet, two file types used by Databricks to store data. Understanding the particularities of these formats will help you select the best one for your project, leading to more efficient results. We will also cover one of the key topics for any big data environment, which is query writing. You'll learn how to write queries in Scala and SQL, giving you the flexibility to work with different languages and tools as needed.

You will learn how to optimize your Databricks workflows for maximum performance and also learn how to use powerful visualization tools to gain valuable insights - in order to drive better decisions for the project.

Overall, the Databricks Fundamentals course is a detailed practical introduction to this big data tool. With guidance from our trainer, who is an experienced Data Engineer, you’ll be able to develop the abilities and confidence to successfully handle the most complex data tasks.

certificate
Po ukończeniu kursu na formularzu Luxoft Training
wydawany jest certyfikat

Objectives

  • Practice working with Notebook
  • Understand Spark internal structures
  • Ascertain the differences between Delta Lake vs Parquet
  • Write query in Scala, Python, & SQL
  • Learn about optimization in Databricks
  • Explore Data deeply with Databricks

Target Audience

  • Developers
  • Architects

Prerequisites

Development experience in Scala, Java, Python, & SQL - 3 months.

Roadmap

1. Introduction to Databricks

  • Creating Databricks Service
  • Databricks RI Overview
  • Databricks Architecture Overview
  • Databricks Notebooks

2. Databricks Cluster and Jobs

  • Cluster types and configuration
  • Databricks cluster pool
  • Databricks Job
  • Notebooks’ workflows

3. DBFS

4. Databricks and Spark

  • Data Formats
  • Transformation
  • Joins, Aggregation
  • SQL

5. Delta Lake

  • Pitfalls of Data Lakes
  • Data Lakehouse Architecture
  • Read & Write to Delta Lake
  • Updates and Deletes on Delta Lake
  • Merge/Upsert to Delta Lake
  • History, Time Travel, Vacuum
  • Delta Lake Transaction Log
  • Convert from Parquet to Delta
  • Data Ingestion
  • Data Transformation - PySpark and Notebooks

6. Visualizations in Databricks

7. Collaboration in Databricks

8. Deploying Databricks on Azure

9. Deploying Databricks on the AWS Marketplace

10. Data Protection Use cases

Harmonogram i ceny
10:00-14:00
Code: EAS-028
Location: Online
Duration: 12 hours
Language: English
Time: 10:00-14:00
Trainer Oleksandr Holota
€ 750 *
Nadal masz pytania?
Połącz sięz nami