Databricks - Big Savings Alert – Don’t Miss This Deal - Ends In 1d 00h 00m 00s Coupon code: 26Y30OFF
  1. Home
  2. Databricks
  3. Databricks-Machine-Learning-Associate Exam
  4. Free Databricks-Machine-Learning-Associate Questions

Free Practice Questions for Databricks Machine Learning Associate Exam

Pass4Future also provide interactive practice exam software for preparing Databricks Certified Machine Learning Associate (Databricks Machine Learning Associate) Exam effectively. You are welcome to explore sample free Databricks Machine Learning Associate Exam questions below and also try Databricks Machine Learning Associate Exam practice test software.

Page:    1 / 14   
Total 74 questions

Question 1

A machine learning engineer is converting a decision tree from sklearn to Spark ML. They notice that they are receiving different results despite all of their data and manually specified hyperparameter values being identical.

Which of the following describes a reason that the single-node sklearn decision tree and the Spark ML decision tree can differ?



Answer : E

One reason that results can differ between sklearn and Spark ML decision trees, despite identical data and hyperparameters, is that Spark ML decision trees test binned feature values as representative split candidates. Spark ML uses a method called 'quantile binning' to reduce the number of potential split points by grouping continuous features into bins. This binning process can lead to different splits compared to sklearn, which tests all possible split points directly. This difference in the splitting algorithm can cause variations in the resulting trees. Reference:

Spark MLlib Documentation (Decision Trees and Quantile Binning).


Question 2

A data scientist is using MLflow to track their machine learning experiment. As a part of each of their MLflow runs, they are performing hyperparameter tuning. The data scientist would like to have one parent run for the tuning process with a child run for each unique combination of hyperparameter values. All parent and child runs are being manually started with mlflow.start_run.

Which of the following approaches can the data scientist use to accomplish this MLflow run organization?



Answer : B

To organize MLflow runs with one parent run for the tuning process and a child run for each unique combination of hyperparameter values, the data scientist can specify nested=True when starting the child run. This approach ensures that each child run is properly nested under the parent run, maintaining a clear hierarchical structure for the experiment. This nesting helps in tracking and comparing different hyperparameter combinations within the same tuning process. Reference:

MLflow Documentation (Managing Nested Runs).


Question 3

Which of the following approaches can be used to view the notebook that was run to create an MLflow run?



Answer : C

To view the notebook that was run to create an MLflow run, you can click the 'Source' link in the row corresponding to the run in the MLflow experiment page. The 'Source' link provides a direct reference to the source notebook or script that initiated the run, allowing you to review the code and methodology used in the experiment. This feature is particularly useful for reproducibility and for understanding the context of the experiment. Reference:

MLflow Documentation (Viewing Run Sources and Notebooks).


Question 4

A data scientist is developing a machine learning pipeline using AutoML on Databricks Machine Learning.

Which of the following steps will the data scientist need to perform outside of their AutoML experiment?



Answer : D

AutoML platforms, such as the one available in Databricks Machine Learning, streamline various stages of the machine learning pipeline including feature engineering, model selection, hyperparameter tuning, and model evaluation. However, exploratory data analysis (EDA) is typically performed outside the AutoML process. EDA involves understanding the dataset, visualizing distributions, identifying anomalies, and gaining insights into data before feeding it into a machine learning pipeline. This step is crucial for ensuring that the data is clean and suitable for model training but is generally done manually by the data scientist.

Reference

Databricks documentation on AutoML: https://docs.databricks.com/applications/machine-learning/automl.html


Question 5

A machine learning engineer has grown tired of needing to install the MLflow Python library on each of their clusters. They ask a senior machine learning engineer how their notebooks can load the MLflow library without installing it each time. The senior machine learning engineer suggests that they use Databricks Runtime for Machine Learning.

Which of the following approaches describes how the machine learning engineer can begin using Databricks Runtime for Machine Learning?



Answer : C

The Databricks Runtime for Machine Learning includes pre-installed packages and libraries essential for machine learning and deep learning, including MLflow. To use it, the machine learning engineer can simply select an appropriate Databricks Runtime ML version from the 'Databricks Runtime Version' dropdown menu while creating their cluster. This selection ensures that all necessary machine learning libraries, including MLflow, are pre-installed and ready for use, avoiding the need to manually install them each time.

Reference

Databricks documentation on creating clusters: https://docs.databricks.com/clusters/create.html


Page:    1 / 14   
Total 74 questions