Azure, Certifications, Professional

DP 420 Cosmos DB Specialty Study Guide

Introduction

This post will cover material I used to pass the DP 420 exam Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB to obtain my Cosmos DB Specialty Certification. This is not an exam dump though I will cover specific areas that were mentioned on the exam which are covered over various MS Learn Modules.

I like to level set on skills whenever putting together one of these study guides as my professional background provides a lens for which how I experienced this certification. My background initial was full stack development, then ETL, and most recently a combination of DevOps and Cloud engineering.

Exam Resources

Unfortunately, at the time of this writing, Microsoft does not have an “official” practice available for the DP 420. Usually I will run through one of those once or twice before taking the exam so I was forced to look at a few other resources.

Azure CosmosDB Essential Series, was a very helpful YouTube series produced by Program managers for Cosmos DB. At the time of this writing it consisted of 12 short (15 minutes or less) videos. It will take you through the basics around terminology and most importantly use cases. Season 3 of the series really does a great job of breaking down how you would configure Cosmos DB in your application’s architecture in real world examples at scale. This is extremely important as the exam will ask questions given real world scenarios.

Azure CosmosDB Labs is a Git Repo containing various material and maintained by the product team. It includes not only labs but PowerPoints you can walk through. Though some of the individual material could be dated it still was helpful to learn the concepts and see what you can do.

MS Learn, I did leverage the MS Learn modules; however, I would strongly encourage you not to solely rely on these trainings. Typically I find the question structure and content does not always relate to the exam and serve more as a 101 vs providing enough knowledge to pass an exam above associate level.

Here are some other resources which I did not leverage when studying for the exam; however, want to call out in case anyone is looking for additional resources:

Exam Outline

The official outline from Microsoft Learn:

  • Design and implement data models (35–40%)
  • Design and implement data distribution (5–10%)
  • Integrate an Azure Cosmos DB solution (5–10%)
  • Optimize an Azure Cosmos DB solution (15–20%)
  • Maintain an Azure Cosmos DB solution (25–30%)

My experience with the exam was pretty consistent with this outline. The only exception was it felt like maintain was slightly lower and perhaps the data distribution was slightly higher. No more then 5-10% delta then what is listed. I’ll break down my experience around each section and some of the topics that are helpful to dive further into.

Design and Implement Data Models

This section was weighted very heavilty. The expectation is you know when and now to denormalize data. Additionally one will need to know how to denormalize and structure data given a set of requirements around how the data is queried and what is most important to the application (read or write optimization)

If you know the basics this section isn’t too tricky. It’s important to remember when you embedded parent or child data and what that may look like as opposed to when you would rather maintain references to data in another container.

Partitioning is very important in Cosmos and as such it is prevalent on this exam. Be sure to brush up on what goes into selecting a partition key and when a aggregate partition key makes sense given a scenario and workload.

I also believe Cosmos Gateway falls into this section as well. So be sure to understand what the impact of having a CosmosDB gateway has on your RUs and application performance. In addition how an application may connect to a gateway as opposed to directly communicating with the Cosmos DB instance.

Be sure to understand how to configure Transaction Time to Live (TTL) in Cosmos and scenarios involving where TTL is configured on the container various configurations on incoming documents may produce different results.

Design and Implement Data Distribution

This one section was really focused on multi-region write scenarios and how data would be distributed/presented by the application. Additionally when given a multi-write scenario or a scenario with various consistency what changes you may have to account for on the client.

Technically manually executing a failover fell into this section, though I’d argue it belongs more in maintenance. So I would make sure you are familiar with how to manually fail over your Cosmos DB instances via such tools as the Azure CLI or PowerShell. Additionally understand what impact this failover may have on your application.

Conflict resolution was a topic that I was completely unprepared for. Be sure to understand the types of conflicts that can occur in your data and how you would go about addressing/configuring them given various scenarios.

Integrate an Azure Cosmos DB Solution

This section was a bit surprising to be honest. There was content around Synapse Link and how to configure Cosmos DB to interact with it via Spark. On this topic how Data Factory can interact with Cosmos DB is good to know.

Change feed also falls into this so a good understanding of configure, managing, and defining what Change Feed is. This is already important when evaluating the features available in Cosmos; however, be sure you are familiar with some of the finer options that are available and configurable.

Optimize an Azure Cosmos DB Solution

Indexes, indexes, indexes. Know this inside and out. This involves understanding the different types of indexes, how to configure them, when to use them, and the impact they will have on your queries. This is another section I was unprepared for and wish would of had a better understanding of.

Maintain an Azure Cosmos DB Solution

Understand how to identify throttling and what actions can be performed automatically when throttling has been detected. On this note be sure to have an understanding on provisioning RUs manually vs autoscaling on top of when it is best to use serverless when given various scenarios.

Along the lines of identifying throttling, be able to identify when a record has been added, deleted, not found, etc.. based on metrics. Speaking of metrics be sure to understand and read the metrics reported back from cosmos on various queries. These metrics provide various information on indexing, cross partitioning, and RUs consumed.

Understand periodic vs continuous backup. Be sure you know how to read configuration as well as perform backups for each.

There was some security covered in terms of RBAC roles and the data vs management plane. Also what access Cosmos might need for other resources and there was some on encryption of data.

In regards to change feed need to be able to troubleshoot such scenarios or situations where change feed is not capturing all events. Realize what features and options are available to you.

Conclusion

I successfully passed the certification, not with the highest score, however enough to validate my learnings. Overall I’d say I studied around 8 hours. In addition to the resources provided I found that reading Microsoft Docs on specific topics, like consistency, data modeling, partitioning really assisted me. As an added bonus I was able to contribute back to Microsoft Docs!

Hopefully this overview is helpful in any attempts to take DP 420 Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB. If this study guide was helpful feel free to check out my other guides on AZ 700 Azure Network Engineer Study Guide, Hashicorp Terraform Associates, DP 203 Data Engineering, AZ 204 Developing Solutions for Microsoft Azure