Skills measured in DP 203 / DP-203

The following skills are measured in DP-203 certification exam:

  • Design and implement data storage (15–20%)
  • Develop data processing (40–45%)
  • Secure, monitor, and optimize data storage and data processing (30–35%)

DP 203 Exam Questions

Q1. What is the recommended storage format to use with Spark?
2) XML
3) Apache Parquet

Apache Parquet. Apache Parquet is a highly optimized solution for data storage and is the recommended option for storage.

Q2. You have an Azure Data Lake Storage Gen2 container that contains 200 TB of data. Which type of data redundancy should you use to ensure that the data in the container is available for read workloads in a secondary region if an outage occurs in the primary region. The solution must minimize costs.

·         A. locally-redundant storage (LRS)
·         B. read-access geo-redundant storage (RA-GRS)
·         C. zone-redundant storage (ZRS)
·         D. geo-redundant storage (GRS)

B. read-access geo-redundant storage (RA-GRS)

Q3. You build a data warehouse in a Synapse Analytics dedicated SQL pool. Analysts write a complex SELECT query that contains multiple JOIN and CASE statements to Manipulate data for use in inventory reports.
You need to implement a solution to make the dataset available for the reports that need to be published daily. What should you implement so the solution must minimize query times.?
·         A. result set caching
·         B. a replicated table
·         C. an ordered clustered columnstore index
·         D. a materialized view

Q4. You are designing a dimension table for a data warehouse. Which type of slowly changing dimension (SCD) should you use for the table will track the value of the dimension attributes over time and preserve the history of the data by adding new rows as the data changes?
·         A. Type 1
·         B. Type 2
·         C. Type 3
·         D. Type 6

Q5. You need to trigger an Azure Data Factory pipeline when a file arrives in an Azure Data Lake Storage Gen2 container.
Which resource provider should you enable?
·         A. Microsoft Sql
·         B. Microsoft Event Grid
·         C. Microsoft Automation
·         D. Microsoft Event Hub

Q6. You plan to perform batch processing in Azure Databricks once daily. Which type of Databricks cluster should you use?
·         A. Interactive
·         B. High Concurrency
·         C. Automated

Q7. You have an Azure Data Factory that contains 10 pipelines. You need to label each pipeline with its main purpose of either ingesting, transforming, or loading. The labels must be available for grouping and filtering when using the monitoring experience in Data Factory.
What should you add to each pipeline?
·         A. an annotation
·         B. a correlation ID
·         C. a resource tag
·         D. a run group ID

Q8. What should you recommend for designing a statistical analysis solution that will use custom proprietary Python functions on near real-time data from Azure Event Hubs.

·         A. Databricks
·         B. Synapse Analytics
·         C. Stream Analytics
·         D. Sql Database

Q9. You create an Azure Databricks cluster and specify an additional library to install.
When you attempt to load the library to a notebook, the library is not found.
What should you review to identify the cause of the issue?

·         A. global init scripts logs
·         B. workspace logs
·         C. notebook logs
·         D. cluster event logs

Q10. You need to examine the pipeline failures from the last 60 days in your Azure data factory.
what should you do?
A. the Activity log blade for the Data Factory resource
B. the Monitor & Manage app in Data Factory
C. the Resource health blade for the Data Factory resource
D. Azure Monitor

Q11. Which of the following terms refer to the scale of compute that is being used in an Azure SQL Synapse Analytics server?
·         A. DTU
·         B. RTU
·         C. DWU

Q12. You have an Azure Synapse Analytics database, within this, you have a dimension table named Stores that contains store information. There is a total of 263 stores nationwide. Store information is retrieved in more than half of the queries that are issued against this database. These queries include staff information per store, sales information per store and finance information. You want to improve the query performance of these queries by configuring the table geometry of the stores table. Which is the appropriate table geometry to select for the stores table?
·         A. Round Robin
·         B. Replicated table
·         C. Non Clustered

Q13. Which Azure Data Factory integration runtime would be used in a data copy activity to move data from an Azure Data Lake Gen2 store to Azure Synapse Analytics?
·         A. Azure IR
·         B. Azure – SSIS
·         C. Pipelines
·         D. Self-hosted

Q14. Is Encrypted communication is turned on automatically when connecting to an Azure SQL Database or Azure Synapse Analytics?
·         False
·         True

Q15. You are designing an enterprise data warehouse in Azure Synapse Analytics. You plan to load millions of rows of data into the data warehouse each day.
You must ensure that staging tables are optimized for data loading.
You need to design the staging tables.

What type of tables should you recommend?
·         A. Hash-distributed table
·         B. Round-robin distributed table
·         C. External table
·         D. Replicated table

Q16. You have an Azure Synapse Analytics dedicated SQL pool.
You need to ensure that data in the pool is encrypted at rest. The solution must NOT require modifying applications that query the data.

What should you do?
·         A. Use a customer-managed key to enable double encryption for the Azure Synapse workspace.
·         B. Create an Azure key vault in the Azure subscription grant access to the pool.
·         C. Enable encryption at rest for the Azure Data Lake Storage Gen2 account.
·         D. Enable Transparent Data Encryption (TDE) for the pool.

Q17. You are designing an enterprise data warehouse in Azure Synapse Analytics that will contain a table named Customers. Customers will contain credit card information.
You need to recommend a solution to provide salespeople with the ability to view all the entries in Customers. The solution must prevent all the salespeople from viewing or inferring the credit card information.
What should you include in the recommendation?
·         A. column-level security
·         B. data masking
·         C. Always Encrypted
·         D. row-level security

Q18. You have an Azure Data Lake Storage Gen2 account named adls2 that is protected by a virtual network.
You are designing a SQL pool in Azure Synapse that will use adls2 as a source.
What should you use to authenticate to adls2?

·         A. a managed identity
·         B. a shared access signature (SAS)
·         C. an Azure Active Directory (Azure AD) user
·         D. a shared key

Q19. Users report slow performance when they run commonly used queries in an enterprise data warehouse in Azure Synapse Analytics.. Users do not report performance changes for infrequently used queries.
Which metric should you monitor to determine the source of the performance issues.
Which metric should you monitor?
·         A. DWU percentage
·         B. DWU limit
·         C. Data IO percentage
·         D. Cache hit percentage

Q20. By default, how many partitions will a new Event Hub have?
·         A. 1
·         B. 2
·         C. 4
·         D. 8

Q21. You are a Data Engineer for a company. You want to view key health metrics of your Stream Analytics jobs. Which tool in Streaming Analytics should you use?
·         A. Dashboards
·         B. Alerts
·         C. Diagnostics
·         D. App

Q22. Which input type should you use for the reference data when You are developing a solution that will stream to Azure Stream Analytics? The solution will have both streaming data and reference data.

·         A. Azure IoT Hub
·         B. Azure Blob storage
·         C. Azure Cosmos DB
·         D. Azure Event Hubs

Q23. Authentication for an Event hub is defined with a combination of an Event Publisher and which other component?
·         A. Storage Account Key
·         B. Shared Access Signature
·         C. Transport Layer Security v1.2

Q24. What is the maximum number of activities per pipeline in Azure Data Factory?
·         A. 60
·         B. 40
·         C. 160
·         D. 80

Q25. A company manages several on-premises Microsoft SQL Server databases.
Which data technology should you use to migrate the databases to Microsoft Azure by using a backup process of Microsoft SQL Server?
·         A. Azure SQL Data Warehouse
·         B. Azure SQL Database Managed Instance
·         C. Azure Cosmos DB
·         D. Azure SQL Database single database

Q26. You are implementing Azure Stream Analytics functions.
Which windowing function should you select for each requirement?

DP 203 Certification

Box 1 – Tumbling
Box 2 – Hoppping
Box 3 – Sliding

Q27. You need to choose four actions that you should perform in sequence.
You are creating a managed data warehouse solution on Microsoft Azure. You need to configure Azure SQL Data Warehouse to receive the data.
You must use PolyBase to retrieve data from Azure Blob storage that resides in parquet format and load the data into a large table called FactSalesOrderDetails.

DP-203 Dump

Step 1: Create a master key on the database
Step 2: Create an external data source for Azure Blob storage
Step 3: Create an external file format to map parquet files.
Step 4: Create the external table FactSalesOrderDetails

Q28. You are processing streaming data from vehicles that pass through a toll booth.
How should you complete the query by using Azure Stream Analytics to return the license plate, vehicle make, and hour the last vehicle passed during each 10-minute window?

DP 203 Dump

Box 1 – Max
Box 2 – TumblingWindow

Q29. You need to store the data to support hourly incremental load pipelines that will vary for each Store ID. The solution must minimize storage costs. How should you complete the code? To answer, select the appropriate options in the answer area.

DP203 Exam Questions

Box 1 – partitionBy
Box 2 – (“StoreID”, “Year”, “Month”, “Day”, “Hour”, )
Box 3: parquet(“/Purchases”)

Q30. You need to calculate the difference in readings per sensor per hour. The solution is build on an Azure Stream Analytics query that will receive input data from Azure IoT Hub and write the results to Azure Blob storage. How should you complete the query?

DP-203 Exam Questions

Box 1 – LAG

