Question 1

A company wants to use machine learning (ML) to improve its customer churn prediction model. The company stores data in an Amazon Redshift data warehouse.

A data science team wants to use Amazon Redshift machine learning (Amazon Redshift ML) to build a model and run predictions for new data directly within the data warehouse.

Which combination of steps should the company take to use Amazon Redshift ML to meet these requirements? (Select THREE.)

ADefine the feature variables and target variable for the churn prediction model.

BUse the SQL EXPLAIN_MODEL function to run predictions.

CWrite a CREATE MODEL SQL statement to create a model.

DUse Amazon Redshift Spectrum to train the model.

EManually export the training data to Amazon S3.

FUse the SQL prediction function to run predictions,

Answer : A, C, F

Amazon Redshift ML enables in-database machine learning model creation and predictions, allowing data scientists to leverage Redshift for model training without needing to export data.

To create and run a model for customer churn prediction in Amazon Redshift ML:

Define the feature variables and target variable: Identify the columns to use as features (predictors) and the target variable (outcome) for the churn prediction model.

Create the model: Write a CREATE MODEL SQL statement, which trains the model using Amazon Redshift's integration with Amazon SageMaker and stores the model directly in Redshift.

Run predictions: Use the SQL PREDICT function to generate predictions on new data directly within Redshift.

Options B, D, and E are not required as Redshift ML handles model creation and prediction without manual data export to Amazon S3 or additional Spectrum integration.

Question 2

A company is building a predictive maintenance model for its warehouse equipment. The model must predict the probability of failure of all machines in the warehouse. The company has collected 10.000 event samples within 3 months. The event samples include 100 failure cases that are evenly distributed across 50 different machine types.

How should the company prepare the data for the model to improve the model's accuracy?

AAdjust the class weight to account for each machine type.

BOversample the failure cases by using the Synthetic Minority Oversampling Technique (SMOTE).

CUndersample the non-failure events. Stratify the non-failure events by machine type.

DUndersample the non-failure events by using the Synthetic Minority Oversampling Technique (SMOTE).

Answer : B

In predictive maintenance, when a dataset is imbalanced (with far fewer failure cases than non-failure cases), oversampling the minority class helps the model learn from the minority class effectively. The Synthetic Minority Oversampling Technique (SMOTE) generates synthetic samples for the minority class by creating data points between existing minority class instances. This can enhance the model's ability to recognize failure patterns, particularly in imbalanced datasets.

SMOTE increases the effective presence of failure cases in the dataset, providing a balanced learning environment for the model. This is more effective than undersampling, which would risk losing important non-failure data.

Question 3

An ecommerce company has observed that customers who use the company's website rarely view items that the website recommends to customers. The company wants to recommend items to customers that customers are more likely to want to purchase.

Which solution will meet this requirement in the SHORTEST amount of time?

AHost the company's website on Amazon EC2 Accelerated Computing instances to increase the website response speed.

BHost the company's website on Amazon EC2 GPU-based instances to increase the speed of the website's search tool.

CIntegrate Amazon Personalize into the company's website to provide customers with personalized recommendations.

DUse Amazon SageMaker to train a Neural Collaborative Filtering (NCF) model to make product recommendations.

Answer : C

Amazon Personalize is a managed AWS service specifically designed to deliver personalized recommendations with minimal development time. It uses machine learning algorithms tailored for recommendation systems, making it highly suitable for applications where quick integration is essential. By using Amazon Personalize, the company can leverage existing customer data to generate real-time, personalized product recommendations that align better with customer preferences, enhancing the likelihood of customer engagement with recommended items.

Options involving EC2 instances with GPU or accelerated computing primarily enhance computational performance but do not inherently improve recommendation relevance, while Amazon SageMaker would require more development effort to achieve similar results.

Question 4

A data scientist uses Amazon SageMaker Data Wrangler to analyze and visualize dat

a. The data scientist wants to refine a training dataset by selecting predictor variables that are strongly predictive of the target variable. The target variable correlates with other predictor variables.

The data scientist wants to understand the variance in the data along various directions in the feature space.

Which solution will meet these requirements?

AUse the SageMaker Data Wrangler multicollinearity measurement features with a variance inflation factor (VIF) score. Use the VIF score as a measurement of how closely the variables are related to each other.

BUse the SageMaker Data Wrangler Data Quality and Insights Report quick model visualization to estimate the expected quality of a model that is trained on the data.

CUse the SageMaker Data Wrangler multicollinearity measurement features with the principal component analysis (PCA) algorithm to provide a feature space that includes all of the predictor variables.

DUse the SageMaker Data Wrangler Data Quality and Insights Report feature to review features by their predictive power.

Answer : C

Principal Component Analysis (PCA) is a dimensionality reduction technique that captures the variance within the feature space, helping to understand the directions in which data varies most. In SageMaker Data Wrangler, the multicollinearity measurement and PCA features allow the data scientist to analyze interdependencies between predictor variables while reducing redundancy. PCA transforms correlated features into a set of uncorrelated components, helping to simplify the dataset without significant loss of information, making it ideal for refining features based on variance.

Options A and D offer methods to understand feature relevance but are less effective for managing multicollinearity and variance representation in the data.

Question 5

Acybersecurity company is collecting on-premises server logs, mobile app logs, and loT sensor dat

a. The company backs up the ingested data in an Amazon S3 bucket and sends the ingested data to Amazon OpenSearch Service for further analysis. Currently, the company has a custom ingestion pipeline that is running on Amazon EC2 instances. The company needs to implement a new serverless ingestion pipeline that can automatically scale to handle sudden changes in the data flow.

Which solution will meet these requirements MOST cost-effectively?

ACreate two Amazon Data Firehose delivery streams to send data to the S3 bucket and OpenSearch Service. Configure the data sources to send data to the delivery streams.

BCreate one Amazon Kinesis data stream. Create two Amazon Data Firehose delivery streams to send data to the S3 bucket and OpenSearch Service. Connect the delivery streams to the data stream. Configure the data sources to send data to the data stream.

CCreate one Amazon Data Firehose delivery stream to send data to OpenSearch Service. Configure the delivery stream to back up the raw data to the S3 bucket. Configure the data sources to send data to the delivery stream.

DCreate one Amazon Kinesis data stream. Create one Amazon Data Firehose delivery stream to send data to OpenSearch Service. Configure the delivery stream to back up the data to the S3 bucket. Connect the delivery stream to the data stream. Configure the data sources to send data to the data stream.

Answer : B

To build a scalable, serverless, and cost-effective data ingestion pipeline, this solution uses a Kinesis data stream to handle fluctuations in data flow, buffering and distributing incoming data in real time. By connecting two Amazon Kinesis Data Firehose delivery streams to the Kinesis data stream, the company can simultaneously route data to Amazon S3 for backup and Amazon OpenSearch Service for analysis.

This approach meets all requirements by providing automatic scaling, reducing operational overhead, and ensuring data storage and analysis without duplicating efforts or needing additional infrastructure.

Free Practice Questions for Amazon MLS-C01 Exam

Question 1

Question 2

Question 3

Question 4

Question 5