Question 1

A company uses an Amazon RDS for PostgreSQL database in the us-east-2 Region. The company wants to have a copy of the database available in the us-west-2 Region as part of a new disaster recovery strategy.

A database architect needs to create the new database. There can be little to no downtime to the source database. The database architect has decided to use AWS Database Migration Service (AWS DMS) to replicate the database across Regions. The database architect will use full load mode and then will switch to change data capture (CDC) mode.

Which parameters must the database architect configure to support CDC mode for the RDS for PostgreSQL database? (Choose three.)

ASet wal_level = logical.

BSet wal_level = replica.

CSet max_replication_slots to 1 or more, depending on the number of DMS tasks.

DSet max_replication_slots to 0 to support dynamic allocation of slots.

ESet wal_sender_timeout to 20,000 milliseconds.

FSet wal_sender_timeout to 5,000 milliseconds.

Answer : A, C, E

Correct Answer: A, C, E

Explanation from Amazon documents:

To enable CDC mode for RDS for PostgreSQL database, the database architect needs to configure the following parameters12:

Set wal_level = logical. This parameter determines how much information is written to the write-ahead log (WAL). For CDC mode, the wal_level must be set to logical, which enables logical decoding of the WAL and allows AWS DMS to read changes from the source database1.

Set max_replication_slots to 1 or more, depending on the number of DMS tasks. This parameter specifies the maximum number of replication slots that the source database can support. A replication slot is a data structure that records the state of a replication stream. AWS DMS uses replication slots to set up logical replication and track changes in the source database. The max_replication_slots parameter must be equal to or greater than the number of DMS tasks that use CDC mode for the source database1.

Set wal_sender_timeout to 20,000 milliseconds. This parameter specifies the amount of time that a WAL sender process waits for feedback from a WAL receiver process before terminating the connection. A WAL sender process is a background process that streams WAL data from the source database to AWS DMS. A WAL receiver process is a background process that receives WAL data from a WAL sender process and writes it to a local file. The wal_sender_timeout parameter must be set to a value greater than 10,000 milliseconds (10 seconds) to prevent connection timeouts during CDC mode2.

Therefore, option A, C, and E are the correct parameters to support CDC mode for RDS for PostgreSQL database. Option B is incorrect because wal_level = replica is not sufficient for logical decoding and CDC mode. Option D is incorrect because max_replication_slots must be a positive integer, not zero. Option F is incorrect because wal_sender_timeout = 5,000 milliseconds is too low and may cause connection timeouts during CDC mode.

Question 2

A company needs to deploy an Amazon Aurora PostgreSQL DB instance into multiple accounts. The company will initiate each DB instance from an existing Aurora PostgreSQL DB instance that runs in a shared account. The company wants the process to be repeatable in case the company adds additional accounts in the future. The company also wants to be able to verify if manual changes have been made to the DB instance configurations after the company deploys the DB instances.

A database specialist has determined that the company needs to create an AWS CloudFormation template with the necessary configuration to create a DB instance in an account by using a snapshot of the existing DB instance to initialize the DB instance. The company will also use the CloudFormation template's parameters to provide key values for the DB instance creation (account ID, etc.).

Which final step will meet these requirements in the MOST operationally efficient way?

ACreate a bash script to compare the configuration to the current DB instance configuration and to report any changes.

BUse the CloudFormation drift detection feature to check if the DB instance configurations have changed.

CSet up CloudFormation to use drift detection to send notifications if the DB instance configurations have been changed.

DCreate an AWS Lambda function to compare the configuration to the current DB instance configuration and to report any changes.

Answer : B

Question 3

A database specialist is launching a test graph database using Amazon Neptune for the first time. The database specialist needs to insert millions of rows of test observations from a .csv file that is stored in Amazon S3. The database specialist has been using a series of API calls to upload the data to the Neptune DB instance.

Which combination of steps would allow the database specialist to upload the data faster? (Choose three.)

AEnsure Amazon Cognito returns the proper AWS STS tokens to authenticate the Neptune DB instance to the S3 bucket hosting the CSV file.

BEnsure the vertices and edges are specified in different .csv files with proper header column formatting.

CUse AWS DMS to move data from Amazon S3 to the Neptune Loader.

DCurl the S3 URI while inside the Neptune DB instance and then run the addVertex or addEdge commands.

EEnsure an IAM role for the Neptune DB instance is configured with the appropriate permissions to allow access to the file in the S3 bucket.

FCreate an S3 VPC endpoint and issue an HTTP POST to the database's loader endpoint.

Answer : B, E, F

Correct Answer: B, E, F

Explanation from Amazon documents:

To upload data faster to a Neptune DB instance from a .csv file stored in Amazon S3, the database specialist should use the Neptune Bulk Loader, which is a feature that allows you to load data from external files directly into a Neptune DB instance1. The Neptune Bulk Loader is faster and has less overhead than the API calls, such as SPARQL INSERT statements or Gremlin addV and addE steps2. The Neptune Bulk Loader supports both RDF and Gremlin data formats1.

To use the Neptune Bulk Loader, the database specialist needs to do the following13:

Ensure the vertices and edges are specified in different .csv files with proper header column formatting. This is required for the Gremlin data format, which uses two .csv files: one for vertices and one for edges. The first row of each file must contain the column names, which must match the property names of the graph elements. The files must also have a column named ~id for vertices and ~from and ~to for edges, which specify the unique identifiers of the graph elements1.

Ensure an IAM role for the Neptune DB instance is configured with the appropriate permissions to allow access to the file in the S3 bucket. This is required for the Neptune DB instance to read the data from the S3 bucket. The IAM role must have a trust policy that allows Neptune to assume the role, and a permissions policy that allows access to the S3 bucket and objects3.

Create an S3 VPC endpoint and issue an HTTP POST to the database's loader endpoint. This is required for the Neptune DB instance to communicate with the S3 bucket without going through the public internet. The S3 VPC endpoint must be in the same VPC as the Neptune DB instance. The HTTP POST request must specify the source parameter as the S3 URI of the .csv file, and optionally other parameters such as format, failOnError, parallelism, etc1.

Therefore, option B, E, and F are the correct steps to upload the data faster. Option A is not necessary because Amazon Cognito is not used for authenticating the Neptune DB instance to the S3 bucket. Option C is not suitable because AWS DMS is not designed for loading graph data into Neptune. Option D is not efficient because curling the S3 URI and running the addVertex or addEdge commands will be slower and more costly than using the Neptune Bulk Loader.

Question 4

A company is using Amazon Redshift. A database specialist needs to allow an existing Redshift cluster to access data from other Redshift clusters. Amazon RDS for PostgreSQL databases, and AWS Glue Data Catalog tables.

Which combination of steps will meet these requirements with the MOST operational efficiency? (Choose three.)

ATake a snapshot of the required tables from the other Redshift clusters. Restore the snapshot into the existing Redshift cluster.

BCreate external tables in the existing Redshift database to connect to the AWS Glue Data Catalog tables.

CUnload the RDS tables and the tables from the other Redshift clusters into Amazon S3. Run COPY commands to load the tables into the existing Redshift cluster.

DUse federated queries to access data in Amazon RDS.

EUse data sharing to access data from the other Redshift clusters.

FUse AWS Glue jobs to transfer the AWS Glue Data Catalog tables into Amazon S3. Create external tables in the existing Redshift database to access this data.

Answer : B, D, E

Correct Answer: B, D, E

Explanation from Amazon documents:

To allow an existing Redshift cluster to access data from other Redshift clusters, Amazon RDS for PostgreSQL databases, and AWS Glue Data Catalog tables, the database specialist should use the following features123:

Create external tables in the existing Redshift database to connect to the AWS Glue Data Catalog tables. This feature allows you to query data stored in Amazon S3 using the AWS Glue Data Catalog as the metadata store. You can create external tables in your Redshift database that reference the data catalog tables and use SQL to query the data in S3. This feature is operationally efficient because it does not require moving or copying the data from S3 to Redshift1.

Use federated queries to access data in Amazon RDS. This feature allows you to query and join data from one or more Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL databases with data already in your Amazon Redshift cluster. You can use SQL to query the RDS databases directly from your Redshift cluster without having to load or unload any data. This feature is operationally efficient because it reduces data movement and storage costs, and simplifies data access and analysis2.

Use data sharing to access data from the other Redshift clusters. This feature allows you to securely share live data across different Redshift clusters without the complexity and delays associated with data copies and data movement. You can share data within or across AWS accounts using a consumer-producer model. The producer cluster grants privileges on one or more schemas, called datashares, to the consumer clusters. The consumer clusters can then query the shared data in the producer cluster as if it were local tables. This feature is operationally efficient because it enables real-time and transactionally consistent data access, and eliminates data duplication and stale data issues3.

Therefore, option B, D, and E are the correct steps to meet the requirements with the most operational efficiency. Option A is not efficient because it involves taking and restoring snapshots, which can be time-consuming and costly. Option C is not efficient because it involves unloading and loading data between S3 and Redshift, which can also incur additional time and cost. Option F is not necessary because it involves transferring the AWS Glue Data Catalog tables into S3, which can be avoided by using external tables to connect to the data catalog tables directly.

Question 5

A company migrated an on-premises Oracle database to Amazon RDS for Oracle. A database specialist needs to monitor the latency of the database.

Which solution will meet this requirement with the LEAST operational overhead?

APublish RDS Performance insights metrics to Amazon CloudWatch. Add AWS CloudTrail filters to monitor database performance

BInstall Oracle Statspack. Enable the performance statistics feature to collect, store, and display performance data to monitor database performance.

CEnable RDS Performance Insights to visualize the database load. Enable Enhanced Monitoring to view how different threads use the CPU

DCreate a new DB parameter group that includes the AllocatedStorage, DBInstanceClassMemory, and DBInstanceVCPU variables. Enable RDS Performance Insights

Answer : C

Correct Answer: C

Explanation from Amazon documents:

Amazon RDS for Oracle is a fully managed relational database service that supports Oracle Database. Amazon RDS for Oracle provides several features to monitor the performance and health of your database, such as RDS Performance Insights, Enhanced Monitoring, Amazon CloudWatch, and AWS CloudTrail.

RDS Performance Insights is a feature that helps you quickly assess the load on your database and determine when and where to take action. RDS Performance Insights displays a dashboard that shows the database load in terms of average active sessions (AAS), which is the average number of sessions that are actively running SQL statements at any given time. RDS Performance Insights also shows the top SQL statements, waits, hosts, and users that are contributing to the database load.

Enhanced Monitoring is a feature that provides metrics in real time for the operating system (OS) that your DB instance runs on. Enhanced Monitoring metrics include CPU utilization, memory, file system, disk I/O, network I/O, process list, and thread count. Enhanced Monitoring allows you to view how different threads use the CPU and how much memory each thread consumes.

By enabling RDS Performance Insights and Enhanced Monitoring for the RDS for Oracle DB instance, the database specialist can monitor the latency of the database with the least operational overhead. This solution will allow the database specialist to use the RDS console or API to enable these features and view the metrics and dashboards without installing any additional software or tools. This solution will also provide comprehensive and granular information about the database load and resource utilization.

Therefore, option C is the correct solution to meet the requirement. Option A is not optimal because publishing RDS Performance Insights metrics to Amazon CloudWatch and adding AWS CloudTrail filters to monitor database performance will incur additional operational overhead and cost. Amazon CloudWatch is a service that collects monitoring and operational data in the form of logs, metrics, and events. AWS CloudTrail is a service that records AWS API calls for your account and delivers log files to you. These services are useful for monitoring performance trends and auditing activities, but they are not necessary for monitoring latency in real time. Option B is not optimal because installing Oracle Statspack and enabling the performance statistics feature will require manual intervention and configuration on the RDS for Oracle DB instance. Oracle Statspack is a tool that collects, stores, and displays performance data for Oracle Database. The performance statistics feature is an option that enables Statspack to collect additional statistics such as wait events, latches, SQL statements, segments, rollback segments, etc. These tools are useful for performance tuning and troubleshooting, but they are not as easy to use as RDS Performance Insights and Enhanced Monitoring. Option D is not relevant because creating a new DB parameter group that includes the AllocatedStorage, DBInstanceClassMemory, and DBInstanceVCPU variables will not help monitor the latency of the database. A DB parameter group is a collection of DB engine configuration values that define how a DB instance operates. The AllocatedStorage parameter specifies the allocated storage size in gibibytes (GiB). The DBInstanceClassMemory parameter specifies the amount of memory available to an instance class in bytes. The DBInstanceVCPU parameter specifies the number of virtual CPUs available to an instance class. These parameters are used to configure the capacity and performance of a DB instance, but they do not provide any monitoring or metrics information. Enabling RDS Performance Insights alone will not provide enough information about the OS-level metrics such as CPU utilization or memory usage.

Free Practice Questions for Amazon DBS-C01 Exam

Question 1

Question 2

Question 3

Question 4

Question 5