Azure, Certifications, Data Factory, Professional

Exam DP-203: Data Engineering on Microsoft Azure (beta)

This is a brief summary of my experience taking
Exam DP-203: Data Engineering on Microsoft Azure (beta) Note it’s still in beta so the information on this may change as Microsoft updates the exam.

Overall thought it was an okay exam. At times felt some of questions were too easy. Had some questions around basic T-SQL query construction and basic Azure fundamentals in terms of storage types. On the other end of the spectrum some of the case studies were a little convoluted and not as straightforward.

If one were to look to take the DP 203 I’d recommend brushing up on:

Azure Synapse
  • What it is, intended audience
  • How to setup and read data externally via Polybase
  • Cost Management
  • When and how to increase performance
Data Modeling
  • Slowly changing dimensions
    • Type 0, 1, 2
    • Denormalized vs Normalized
      • Difference between the two
      • When to use which
  • Partitioning, partitioning, partitioning. A lot on this
  • Columnstore indexes
  • Hash vs Round robin distribution
Scripting Languages
  • SQL
  • R
Stream Analytics
  • Windowing functions….lots on this
    • Tumbling
    • Sliding
    • Hoping
    • Session
    • Snapshot
  • When and how to decrease latency
  • What it can/can’t do
Data Factory
  • Triggers/Schedules
  • Integration with Azure Synapse
  • Optimize data for batch processing
  • Best types of file formats for various activities
  • Types of Data Movements
    • Mapping
    • External
    • Copy
  • Integration Runtimes, when to use each
    • Azure
    • self-hosted
    • SSIS
Storage Accounts
  • Cost tiers
    • Hot
    • Cold
    • Archive
  • Redundancy
    • Local
    • Zone
    • Geo read only
    • Geo
  • How to read from Data Factory, Synapse, Databricks