Pass4Future also provide interactive practice exam software for preparing Databricks Certified Data Engineer Associate (Databricks Certified Data Engineer Associate) Exam effectively. You are welcome to explore sample free Databricks Certified Data Engineer Associate Exam questions below and also try Databricks Certified Data Engineer Associate Exam practice test software.
Do you know that you can access more real Databricks-Certified-Data-Engineer-Associate exam questions via Premium Access? ()
A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.
Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?
Answer : D
Cron syntax is a tool that can be used to represent and submit a complex run schedule programmatically. Cron syntax is a string of six fields that specify the frequency, date, and time of a job run. For example, the cron expression0 0 12 * * ?means run the job at 12:00 PM every day. The data engineer can use the Databricks REST API to create or update a job with a cron schedule. The data engineer can also use the Databricks CLI to create or update a job with a cron schedule by using a JSON file that contains the cron expression. The other tools are either invalid or not suitable for representing and submitting a complex run schedule programmatically.Reference:Schedule a job,Jobs API,Databricks CLI,Cron expressions
A dataset has been defined using Delta Live Tables and includes an expectations clause:
CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE
What is the expected behavior when a batch of data containing data that violates these constraints is processed?
Answer : B
The expected behavior when a batch of data containing data that violates the expectation is processed is that the job will fail. This is because the expectation clause has theON VIOLATION FAIL UPDATEoption, which means that if any record in the batch does not meet the expectation, the entire batch will be rejected and the job will fail. This option is useful for enforcing strict data quality rules and preventing invalid data from entering the target dataset.
Option A is not correct, as theON VIOLATION FAIL UPDATEoption does not drop the records that violate the expectation, but fails the entire batch. To drop the records that violate the expectation and record them as invalid in the event log, theON VIOLATION DROP RECORDoption should be used.
Option C is not correct, as theON VIOLATION FAIL UPDATEoption does not drop the records that violate the expectation, but fails the entire batch. To drop the records that violate the expectation and load them into a quarantine table, theON VIOLATION QUARANTINE RECORDoption should be used.
Option D is not correct, as theON VIOLATION FAIL UPDATEoption does not add the records that violate the expectation, but fails the entire batch. To add the records that violate the expectation and record them as invalid in the event log, theON VIOLATION LOG RECORDoption should be used.
Option E is not correct, as theON VIOLATION FAIL UPDATEoption does not add the records that violate the expectation, but fails the entire batch. To add the records that violate the expectation and flag them as invalid in a field added to the target dataset, theON VIOLATION FLAG RECORDoption should be used.
Delta Live Tables Expectations
[Databricks Data Engineer Professional Exam Guide]
Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?
Answer : B
To send the Databricks Job owner an email in the case that the Job fails, the best approach is to set up an Alert in the Job page. This way, the Job owner can configure the email address and the notification type for the Job failure event. The other options are either not feasible, not reliable, or not relevant for this task. Manually programming an alert system in each cell of the Notebook is tedious and error-prone. Setting up an Alert in the Notebook is not possible, as Alerts are only available for Jobs and Clusters. There is a way to notify the Job owner in the case of Job failure, so option D is incorrect. MLflow Model Registry Webhooks are used for model lifecycle events, not Job events, so option E is not applicable.Reference:
Add email and system notifications for job events
Alerts
MLflow Model Registry Webhooks
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
Answer : C
Option C is the correct answer because Parquet files have a well-defined schema that is embedded within the data itself. This means that the data types and column names of the Parquet files are automatically detected and preserved when creating an external table from them. This also enables the use of SQL and other structured query languages to access and analyze the data. CSV files, on the other hand, do not have a schema embedded in them, and require specifying the schema explicitly or inferring it from the data when creating an external table from them. This can lead to errors or inconsistencies in the data types and column names, and also increase the processing time and complexity.
Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?
Answer : E
A data lakehouse is a new paradigm that can be used to simplify and unify siloed data architectures that are specialized for specific use cases. A data lakehouse combines the best of both data lakes and data warehouses, providing a single platform that supports diverse data types, open standards, low-cost storage, high-performance queries, ACID transactions, schema enforcement, and governance. A data lakehouse enables data engineers to build reliable and scalable data pipelines that can serve various downstream applications and users, such as data science, machine learning, analytics, and reporting. A data lakehouse leverages the power of Delta Lake, a storage layer that brings reliability and performance to data lakes.Reference:What is a data lakehouse?,Delta Lake,Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics