Prepare DEA-C01 Question Answers - DEA-C01 Exam Dumps [Q29-Q50]

Prepare DEA-C01 Question Answers - DEA-C01 Exam Dumps

Real Snowflake DEA-C01 Exam Questions [Updated 2024]

Snowflake DEA-C01 Exam Syllabus Topics:

Topic	Details
Topic 1	Security: The Security topic of the DEA-C01 test covers the principles of Snowflake security, including the management of system roles and data governance. It measures the ability to secure data and ensure compliance with policies, crucial for maintaining secure data environments for Snowflake Data Engineers and Software Engineers.
Topic 2	Storage and Data Protection: The topic tests the implementation of data recovery features and the understanding of Snowflake's Time Travel and micro-partitions. Engineers are evaluated on their ability to create new environments through cloning and ensure data protection, highlighting essential skills for maintaining Snowflake data integrity and accessibility.
Topic 3	Data Movement: Snowflake Data Engineers and Software Engineers are assessed on their proficiency to load, ingest, and troubleshoot data in Snowflake. It evaluates skills in building continuous data pipelines, configuring connectors, and designing data sharing solutions.
Topic 4	Data Transformation: The SnowPro Advanced: Data Engineer exam evaluates skills in using User-Defined Functions (UDFs), external functions, and stored procedures. It assesses the ability to handle semi-structured data and utilize Snowpark for transformations. This section ensures Snowflake engineers can effectively transform data within Snowflake environments, critical for data manipulation tasks.
Topic 5	Performance Optimization: This topic assesses the ability to optimize and troubleshoot underperforming queries in Snowflake. Candidates must demonstrate knowledge in configuring optimal solutions, utilizing caching, and monitoring data pipelines. It focuses on ensuring engineers can enhance performance based on specific scenarios, crucial for Snowflake Data Engineers and Software Engineers.

NEW QUESTION # 29
Which of the following best describes the type of data found in traditional relational databases?

A. Unstructured data
B. Structured data
C. Free-form data
D. Semi-structured data

Answer: B

NEW QUESTION # 30
Ira a Data Engineer with TESLA IT systems, looking out to Compare Traditional Partitioning vs Snowflake micro-partitions for one of the Snowflake Project implementations. Which one of the following is incorrect understanding of Ira about Micro Partitioning?

A. Snowflake stores metadata about all rows stored in a micro-partition, including number of distinct columns.
B. All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage compared to traditional partitioning where specialized DDL required.
C. The micro-partition metadata maintained by Snowflake enables precise pruning of col-umns in micro-partitions at query run-time, including columns containing semi-structured data.
D. In Snowflake, as data is inserted/loaded into a table, clustering metadata is collected and recorded for each micro-partition created during the process.
E. All DML operations (e.g. DELETE, UPDATE, MERGE) take advantage of the under-lying micro-partition metadata to facilitate and simplify table maintenance.

Answer: A

Explanation:
Explanation
What are Micro-partitions?
All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage. Each micro-partition contains between 50 MB and 500 MB of uncompressed data (note that the actual size in Snowflake is smaller because data is always stored compressed). Groups of rows in tables are mapped into individual micro-partitions, organized in a columnar fashion. This size and structure allow for extremely granular pruning of very large tables, which can be comprised of millions, or even hundreds of millions, of micro-partitions.
Snowflake stores metadata about all rows stored in a micro-partition, including:
The range of values for each of the columns in the micro-partition.
The number of distinct values.
Additional properties used for both optimization and efficient query processing.
It Never stores number of columns as part of Metadata.
Rest of the statements are correct.

NEW QUESTION # 31
A Data Engineer wants to centralize grant management to maximize security. A user needs ownership on a table m a new schema However, this user should not have the ability to make grant decisions What is the correct way to do this?

A. Add the with managed access parameter on the schema
B. Revoke grant decisions from the user on the schema.
C. Grant ownership to the user on the table
D. Revoke grant decisions from the user on the table

Answer: A

Explanation:
Explanation
The with managed access parameter on the schema enables the schema owner to control the grant and revoke privileges on the objects within the schema. This way, the user who owns the table cannot make grant decisions, but only the schema owner can. This is the best way to centralize grant management and maximize security.

NEW QUESTION # 32
To view/monitor the clustering metadata for a table, Snowflake provides which of the following system functions?

A. SYSTEM$CLUSTERING_KEY_INFORMATION (including clustering depth)
B. SYSTEM$CLUSTERING_INFORMATION (including clustering depth)
C. SYSTEM$CLUSTERING_DEPTH_KEY
D. SYSTEM$CLUSTERING_DEPTH

Answer: B,D

Explanation:
Explanation
SYSTEM$CLUSTERING_DEPTH:
Computes the average depth of the table according to the specified columns (or the clustering key defined for the table). The average depth of a populated table (i.e. a table containing data) is always 1 or more. The smaller the average depth, the better clustered the table is with regards to the speci-fied columns.
Calculate the clustering depth for a table using two columns in the table:
SELECT SYSTEM$CLUSTERING_DEPTH('TPCH_PRODUCT', '(C2, C9)');
SYSTEM$CLUSTERING_INFORMATION:
Returns clustering information, including average clustering depth, for a table based on one or more columns in the table.
SELECT SYSTEM$CLUSTERING_INFORMATION('SAMPLE_TABLE', '(col1, col3)');

NEW QUESTION # 33
A Data Engineer needs to ingest invoice data in PDF format into Snowflake so that the data can be queried and used in a forecasting solution.
..... recommended way to ingest this data?

A. Create an external table on the PDF files that are stored in a stage and parse the data nto structured data
B. Use Snowpipe to ingest the files that land in an external stage into a Snowflake table
C. Create a Java User-Defined Function (UDF) that leverages Java-based PDF parser libraries to parse PDF data into structured data
D. Use a COPY INTO command to ingest the PDF files in an external stage into a Snowflake table with a VARIANT column.

Answer: C

Explanation:
Explanation
The recommended way to ingest invoice data in PDF format into Snowflake is to create a Java User-Defined Function (UDF) that leverages Java-based PDF parser libraries to parse PDF data into structured data. This option allows for more flexibility and control over how the PDF data is extracted and transformed. The other options are not suitable for ingesting PDF data into Snowflake. Option A and B are incorrect because Snowpipe and COPY INTO commands can only ingest files that are in supported file formats, such as CSV, JSON, XML, etc. PDF files are not supported by Snowflake and will cause errors or unexpected results.
Option C is incorrect because external tables can only query files that are in supported file formats as well.
PDF files cannot be parsed by external tables and will cause errors or unexpected results.

NEW QUESTION # 34
A company stores data from an application in an Amazon DynamoDB table that operates in provisioned capacity mode. The workloads of the application have predictable throughput load on a regular schedule. Every Monday, there is an immediate increase in activity early in the morning.
The application has very low usage during weekends.
The company must ensure that the application performs consistently during peak usage times.
Which solution will meet these requirements in the MOST cost-effective way?

A. Increase the provisioned capacity to the maximum capacity that is currently present during peak load times.
B. Use AWS Application Auto Scaling to schedule higher provisioned capacity for peak usage times.
Schedule lower capacity during off-peak times.
C. Divide the table into two tables. Provision each table with half of the provisioned capacity of the original table. Spread queries evenly across both tables.
D. Change the capacity mode from provisioned to on-demand. Configure the table to scale up and scale down based on the load on the table.

Answer: B

NEW QUESTION # 35
A Data Engineer needs to know the details regarding the micro-partition layout for a table named invoice using a built-in function.
Which query will provide this information?

A. SELECT SYSTEM$CLUSTERING_INTFORMATICII ('Invoice' ) ;
B. SELECT $CLUSTERXNG_INFQRMATION ('Invoice')'
C. CALL $CLUSTERINS_INFORMATION('Invoice');
D. CALL SYSTEM$CLUSTERING_INFORMATION ('Invoice');

Answer: A

Explanation:
Explanation
The query that will provide information about the micro-partition layout for a table named invoice using a built-in function is SELECT SYSTEM$CLUSTERING_INFORMATION('Invoice');. The SYSTEM$CLUSTERING_INFORMATION function returns information about the clustering status of a table, such as the clustering key, the clustering depth, the clustering ratio, the partition count, etc. The function takes one argument: the table name in a qualified or unqualified form. In this case, the table name is Invoice and it is unqualified, which means that it will use the current database and schema as the context. The other options are incorrect because they do not use a valid built-in function for providing information about the micro-partition layout for a table. Option B is incorrect because it uses $CLUSTERING_INFORMATION instead of SYSTEM$CLUSTERING_INFORMATION, which is not a valid function name. Option C is incorrect because it uses CALL instead of SELECT, which is not a valid way to invoke a table function.
Option D is incorrect because it uses CALL instead of SELECT and $CLUSTERING_INFORMATION instead of SYSTEM$CLUSTERING_INFORMATION, which are both invalid.

NEW QUESTION # 36
Which stages support external tables?

A. Internal stages only; within a single Snowflake account
B. External stages only, only on the same region and cloud provider as the Snowflake account
C. internal stages only from any Snowflake account in the organization
D. External stages only from any region, and any cloud provider

Answer: D

Explanation:
Explanation
External stages only from any region, and any cloud provider support external tables. External tables are virtual tables that can query data from files stored in external stages without loading them into Snowflake tables. External stages are references to locations outside of Snowflake, such as Amazon S3 buckets, Azure Blob Storage containers, or Google Cloud Storage buckets. External stages can be created from any region and any cloud provider, as long as they have a valid URL and credentials. The other options are incorrect because internal stages do notsupport external tables. Internal stages are locations within Snowflake that can store files for loading or unloading data. Internal stages can be user stages, table stages, or named stages.

NEW QUESTION # 37
Which one is not the Core benefits of micro-partitioning

A. Micro-partitions can overlap in their range of values, helps data skewing.
B. Snowflake micro-partitions are derived automatically they do not need to be explicitly defined up-front or maintained by users.
C. Columns are also compressed individually within micro-partitions.
D. Enables extremely efficient DML and fine-grained pruning for faster queries.
E. Columns are stored independently within micro-partitions, often referred to as colum-nar storage.

Answer: A

Explanation:
Explanation
The benefits of Snowflake's approach to partitioning table data include:
In contrast to traditional static partitioning, Snowflake micro-partitions are derived automatically; they don't need to be explicitly defined up-front or maintained by users.
As the name suggests, micro-partitions are small in size (50 to 500 MB, before compression), which enables extremely efficient DML and fine-grained pruning for faster queries.
Micro-partitions can overlap in their range of values, which, combined with their uniformly small size, helps prevent skew.
Columns are stored independently within micro-partitions, often referred to as columnar storage. This enables efficient scanning of individual columns; only the columns referenced by a query are scanned.
Columns are also compressed individually within micro-partitions. Snowflake automatically de-termines the most efficient compression algorithm for the columns in each micro-partition.

NEW QUESTION # 38
Robert, A Data Engineer, found that Pipe become stale as it was paused for longer than the limited retention period for event messages received for the pipe (14 days by default) & also the previous pipe owner transfers the ownership of this pipe to Robert role while the pipe was paused. How Robert in this case, Resume this stale pipe?

A. PIPE needs to recreate in this scenario, as pipe already past 14 days of period & stale.
B. select sys-tem$pipe_force_resume('mydb.myschema.stalepipe','staleness_check_override, ownership_transfer_check_override');
C. He can apply System function SYSTEM$PIPE_STALE_RESUME with ALTER PIPE statement.
D. ALTER PIPES ... RESUME statement will resume the pipe.
E. Robert can use SYSTEM$PIPE_FORCE_RESUME function to resume this stale pipe.

Answer: B

Explanation:
Explanation
When a pipe is paused, event messages received for the pipe enter a limited retention period. The period is 14 days by default. If a pipe is paused for longer than 14 days, it is considered stale.
To resume a stale pipe, a qualified role must call the SYSTEM$PIPE_FORCE_RESUME function and input the STALENESS_CHECK_OVERRIDE argument. This argument indicates an under-standing that the role is resuming a stale pipe.
For example, resume the stale stalepipe1 pipe in the mydb.myschema database and schema:
SELECT SYS-TEM$PIPE_FORCE_RESUME('mydb.myschema.stalepipe1','staleness_check_override'); While the stale pipe was paused, if ownership of the pipe was transferred to another role, then re-suming the pipe requires the additional OWNERSHIP_TRANSFER_CHECK_OVERRIDE argu-ment. For example, resume the stale stalepipe2 pipe in the mydb.myschema database and schema, which transferred to a new role:
SELECT SYS-TEM$PIPE_FORCE_RESUME('mydb.myschema.stalepipe1','staleness_check_override, own-ership_transfer_check_override');

NEW QUESTION # 39
Elon, a Data Engineer, needs to Split Semi-structured Elements from the Source files and load them as an array into Separate Columns.
Source File:
1.+----------------------------------------------------------------------+
2.| $1 |
3.|----------------------------------------------------------------------|
4.| {"mac_address": {"host1": "197.128.1.1","host2": "197.168.0.1"}}, |
5.| {"mac_address": {"host1": "197.168.2.1","host2": "197.168.3.1"}} |
6.+----------------------------------------------------------------------+ Output: Splitting the Machine Address as below.
1.COL1 | COL2 |
2.|----------+----------|
3.| [ | [ |
4.| "197", | "197", |
5.| "128", | "168", |
6.| "1", | "0", |
7.| "1" | "1" |
8.| ] | ] |
9.| [ | [ |
10.| "197", | "197", |
11.| "168", | "168", |
12.| "2", | "3", |
13.| "1" | "1" |
14.| ] | ]
Which SnowFlake Function can Elon use to transform this semi structured data in the output for-mat?

A. CONVERT_TO_ARRAY
B. NEST
C. SPLIT
D. GROUP_BY_CONNECT

Answer: C

NEW QUESTION # 40
A marketing company collects clickstream data. The company sends the clickstream data to Amazon Kinesis Data Firehose and stores the clickstream data in Amazon S3. The company wants to build a series of dashboards that hundreds of users from multiple departments will use.
The company will use Amazon QuickSight to develop the dashboards. The company wants a solution that can scale and provide daily updates about clickstream activity.
Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)

A. Use Amazon Redshift to store and query the clickstream data.
B. Use Amazon Athena to query the clickstream data
C. Access the query data through QuickSight SPICE (Super-fast, Parallel, In-memory Calculation Engine). Configure a daily refresh for the dataset.
D. Use Amazon S3 analytics to query the clickstream data.
E. Access the query data through a QuickSight direct SQL query.

Answer: B,C

Explanation:
Athena would be cheaper than Redshift. S3 analytics is irrelevant. The functionality in SPICE should be more cost effective than direct SQL by reducing the frequency and volume of queries.

NEW QUESTION # 41
A company stores data in a data lake that is in Amazon S3. Some data that the company stores in the data lake contains personally identifiable information (PII). Multiple user groups need to access the raw data. The company must ensure that user groups can access only the PII that they require.
Which solution will meet these requirements with the LEAST effort?

A. Create IAM roles that have different levels of granular access. Assign the IAM roles to IAM user groups. Use an identity-based policy to assign access levels to user groups at the column level.
B. Use Amazon Athena to query the data. Set up AWS Lake Formation and create data filters to establish levels of access for the company's IAM roles. Assign each user to the IAM role that matches the user's PII access requirements.
C. Use Amazon QuickSight to access the data. Use column-level security features in QuickSight to limit the PII that users can retrieve from Amazon S3 by using Amazon Athena. Define QuickSight access levels based on the PII access requirements of the users.
D. Build a custom query builder UI that will run Athena queries in the background to access the data.
Create user groups in Amazon Cognito. Assign access levels to the user groups based on the PII access requirements of the users.

Answer: B

Explanation:
https://aws.amazon.com/blogs/big-data/anonymize-and-manage-data-in-your-data-lake-with- amazon-athena-and-aws-lake-formation/

NEW QUESTION # 42
Let us say you have List of 50 Source files, which needs to be loaded into Snowflake internal stage. All these Source system files are already Brotli-compressed files. Which statement is correct with respect to Compression of Staged Files?

A. Even though Source files are already compressed, Snowflake do apply default gzip2 Compression to optimize the storage cost.
B. When staging 50 compressed files in a Snowflake stage, the files are automatically com-pressed using gzip.
C. Auto-detection is not yet supported for Brotli-compressed files; when staging or loading Brotli-compressed files, you must explicitly specify the compression method that was used.
D. Snowflake automatically detect Brotli Compression, will skip further compression of all 50 files.

Answer: C

Explanation:
Explanation
Auto-detection is not yet supported for Brotli-compressed files; when staging or loading Brotli-compressed files, you must explicitly specify the compression method that was used.
To Know more about Compression of Staged Files, please refer the link:
https://docs.snowflake.com/en/user-guide/intro-summary-loading.html#compression-of-staged-files

NEW QUESTION # 43
A Data Engineer executes a complex query and wants to make use of Snowflake s query results caching capabilities to reuse the results.
Which conditions must be met? (Select THREE).

A. The micro-partitions cannot have changed due to changes to other data in the table
B. The new query must have the same syntax as the previously executed query.
C. The results must be reused within 72 hours.
D. The USED_CACHED_RESULT parameter must be included in the query.
E. The query must be executed using the same virtual warehouse.
F. The table structure contributing to the query result cannot have changed

Answer: B,C,F

Explanation:
Explanation
Snowflake's query results caching capabilities allow users to reuse the results of previously executed queries without re-executing them. For this to happen, the following conditions must be met:
The results must be reused within 24 hours (not 72 hours), which is the default time-to-live (TTL) for cached results.
The query must be executed using any virtual warehouse (not necessarily the same one), as long as it is in the same region and account as the original query.
The USED_CACHED_RESULT parameter does not need to be included in the query, as it is enabled by default at the account level. However, it can be disabled or overridden at the session or statement level.
The table structure contributing to the query result cannot have changed, such as adding or dropping columns, changing data types, or altering constraints.
The new query must have the same syntax as the previously executed query, including whitespace and case sensitivity.
The micro-partitions cannot have changed due to changes to other data in the table, such as inserting, updating, deleting, or merging rows.

NEW QUESTION # 44
To troubleshoot data load failure in one of your Copy Statement, Data Engineer have Executed a COPY statement with the VALIDATION_MODE copy option set to RETURN_ALL_ERRORS with reference to the set of files he had attempted to load. Which below function can facilitate analysis of the problematic records on top of the Results produced? [Select 2]

A. LOAD_ERROR
B. RESULT_SCAN
C. LAST_QUERY_ID
D. Rejected_record

Answer: B,C

Explanation:
Explanation
LAST_QUERY_ID() Function
Returns the ID of a specified query in the current session. If no query is specified, the most recently executed query is returned.
RESULT_SCAN() Function
Returns the result set of a previous command (within 24 hours of when you executed the query) as if the result was a table.
The following example validates a set of files (SFfile.csv.gz) that contain errors. To facilitate analy-sis of the errors, a COPY INTO <location> statement then unloads the problematic records into a text file so they could be analyzed and fixed in the original data files. The statement queries the RESULT_SCAN table.
1.#copy into Snowtable
2.from @SFstage/SFfile.csv.gz
3.validation_mode=return_all_errors;
4.#set qid=last_query_id();
5.#copy into @SFstage/errors/load_errors.txt from (select rejected_record from ta-ble(result_scan($qid))); Note: Other options are not valid functions.

NEW QUESTION # 45
You can execute zero, one, or more transactions inside a stored procedure?

A. TRUE
B. FALSE

Answer: A

NEW QUESTION # 46
John, Data Engineer, do have technical requirements to refresh the External tables Metadata period-ically or in auto mode, which approach John can take to meet this technical specification?

A. External table cannot be scheduled via Snowflake Tasks, 3rd party tools/scripts needs to be used provided by External cloud storage provider.
B. John can use AUTO_REFRESH parameter if the underlying External Cloud host sup-ports this for External tables.
C. He can create a task that executes an ALTER EXTERNAL TABLE ... REFRESH statement every 5 minutes.
D. Snowflake implicitly take care this Infrastructure needs, as underlying warehouse layer internally manage the refresh. No action needed from John.

Answer: B,C

Explanation:
Explanation
Both Option A & B are correct.
For Refreshing External Table Metadata on a Auto Mode, John can use the AUTO_REFRESH pa-rameter properties of External table.When an external table is created, the AUTO_REFRESH pa-rameter is set to TRUE by default.
When an external table is created, the AUTO_REFRESH parameter is set to TRUE by default.
Snowflake recommend that you accept this default value for external tables that reference data files in either Amazon S3 or Microsoft Azure stages.
However, the automatic refresh option is not available currently for external tables that reference Google Cloud Storage stages.
For these external tables, manually refreshing the metadata on a schedule can be useful.
The following example refreshes the metadata for an external table named snowdb.snowschema.snow_ext_table (using ALTER EXTERNAL TABLE ... REFRESH) on a schedule.
-- Create a task that executes an ALTER EXTERNAL TABLE ... REFRESH statement every 5 minutes.
1.CREATE TASK snow_ext_table_refresh_task
2.WAREHOUSE=mywh
3.SCHEDULE='5 minutes'
4.AS
5.ALTER EXTERNAL TABLE snowmydb.snowmyschema.snow_ext_table REFRESH;

NEW QUESTION # 47
Which property can be used with ALTER USER command to temporarily disable MFA for the user so that they can log in?

A. SECS_TO_BYPASS_MFA
B. MINS_TO_BYPASS_MFA
C. HOURS_TO_BYPASS_MFA
D. MINS_TO_SKIP_MFA

Answer: B

Explanation:
Explanation
You can use the following properties for the ALTER USER command to perform these tasks:
MINS_TO_BYPASS_MFA
Specifies the number of minutes to temporarily disable MFA for the user so that they can log in. Af-ter the time passes, MFA is enforced and the user cannot log in without the temporary token gener-ated by the Duo Mobile application.

NEW QUESTION # 48
Company DEF has a strict security policy that mandates that all data at rest in Amazon S3 must be encrypted. They want to ensure that the encryption keys are managed by AWS, but they also want the flexibility to change the encryption keys when required.
Which of the following encryption methods best meets Company DEF's requirements?

A. Client-Side Encryption with a client-side master key.
B. Server-Side Encryption with Customer-Provided Keys (SSE-C).
C. Server-Side Encryption with AWS Key Management Service (SSE-KMS).
D. Server-Side Encryption with Amazon S3 Managed Keys (SSE-S3).

Answer: C

NEW QUESTION # 49
Assuming that the session parameter USE_CACHED_RESULT is set to false, what are characteristics of Snowflake virtual warehouses in terms of the use of Snowpark?

A. Creating a DataFrame from a staged file with the read () method will start a virtual warehouse
B. Calling a Snowpark stored procedure to query the database with session, call () will start a virtual warehouse
C. Transforming a DataFrame with methods like replace () will start a virtual warehouse -
D. Creating a DataFrame from a table will start a virtual warehouse

Answer: D

Explanation:
Explanation
Creating a DataFrame from a table will start a virtual warehouse because it requires reading data from Snowflake. The other options will not start a virtual warehouse because they either operate on local data or use an existing session to query Snowflake.

NEW QUESTION # 50
......

DEA-C01 Exam Dumps Pass with Updated 2024: https://simplilearn.lead1pass.com/Snowflake/DEA-C01-practice-exam-dumps.html

Related Blogs

Prepare DEA-C01 Question Answers - DEA-C01 Exam Dumps [Q29-Q50]

Snowflake DEA-C01 Certification Practice Exam