Databricks - Big Savings Alert – Don’t Miss This Deal - Ends In 1d 00h 00m 00s Coupon code: 26Y30OFF
  1. Home
  2. Databricks
  3. Databricks-Certified-Data-Engineer-Associate Exam
  4. Free Databricks-Certified-Data-Engineer-Associate Questions

Free Practice Questions for Databricks Certified Data Engineer Associate Exam

Pass4Future also provide interactive practice exam software for preparing Databricks Certified Data Engineer Associate (Databricks Certified Data Engineer Associate) Exam effectively. You are welcome to explore sample free Databricks Certified Data Engineer Associate Exam questions below and also try Databricks Certified Data Engineer Associate Exam practice test software.

Page:    1 / 14   
Total 109 questions

Question 1

A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs.

Rather than manually selecting each value in the scheduling form in Databricks, which of the following tools can the data engineer use to represent and submit the schedule programmatically?



Question 2

A dataset has been defined using Delta Live Tables and includes an expectations clause:

CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE

What is the expected behavior when a batch of data containing data that violates these constraints is processed?



Answer : B

The expected behavior when a batch of data containing data that violates the expectation is processed is that the job will fail. This is because the expectation clause has theON VIOLATION FAIL UPDATEoption, which means that if any record in the batch does not meet the expectation, the entire batch will be rejected and the job will fail. This option is useful for enforcing strict data quality rules and preventing invalid data from entering the target dataset.

Option A is not correct, as theON VIOLATION FAIL UPDATEoption does not drop the records that violate the expectation, but fails the entire batch. To drop the records that violate the expectation and record them as invalid in the event log, theON VIOLATION DROP RECORDoption should be used.

Option C is not correct, as theON VIOLATION FAIL UPDATEoption does not drop the records that violate the expectation, but fails the entire batch. To drop the records that violate the expectation and load them into a quarantine table, theON VIOLATION QUARANTINE RECORDoption should be used.

Option D is not correct, as theON VIOLATION FAIL UPDATEoption does not add the records that violate the expectation, but fails the entire batch. To add the records that violate the expectation and record them as invalid in the event log, theON VIOLATION LOG RECORDoption should be used.

Option E is not correct, as theON VIOLATION FAIL UPDATEoption does not add the records that violate the expectation, but fails the entire batch. To add the records that violate the expectation and flag them as invalid in a field added to the target dataset, theON VIOLATION FLAG RECORDoption should be used.


Delta Live Tables Expectations

[Databricks Data Engineer Professional Exam Guide]

Question 3

Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?



Answer : B

To send the Databricks Job owner an email in the case that the Job fails, the best approach is to set up an Alert in the Job page. This way, the Job owner can configure the email address and the notification type for the Job failure event. The other options are either not feasible, not reliable, or not relevant for this task. Manually programming an alert system in each cell of the Notebook is tedious and error-prone. Setting up an Alert in the Notebook is not possible, as Alerts are only available for Jobs and Clusters. There is a way to notify the Job owner in the case of Job failure, so option D is incorrect. MLflow Model Registry Webhooks are used for model lifecycle events, not Job events, so option E is not applicable.Reference:

Add email and system notifications for job events

Alerts

MLflow Model Registry Webhooks


Question 4

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?



Answer : C

Option C is the correct answer because Parquet files have a well-defined schema that is embedded within the data itself. This means that the data types and column names of the Parquet files are automatically detected and preserved when creating an external table from them. This also enables the use of SQL and other structured query languages to access and analyze the data. CSV files, on the other hand, do not have a schema embedded in them, and require specifying the schema explicitly or inferring it from the data when creating an external table from them. This can lead to errors or inconsistencies in the data types and column names, and also increase the processing time and complexity.


Question 5

Page:    1 / 14   
Total 109 questions