Foundations for Big Data Analysis with SQL

Foundations for Big Data Analysis with SQL Quiz Answer. This course is offered by “Coursera”. In this post you will get Foundations for Big Data Analysis with SQL Coursera Quiz Answer | 100% Correct Answer

 

Foundations for Big Data Analysis with SQL Coursera Quiz Answer

Offered By ”Cloudera”

Enroll Now

Data and Databases

1.
Question 1
Which of the following are examples of digital data? Check all that apply.

1 point

  • A song played live by an orchestra
  • A cassette tape recording of a song played live by an orchestra (Hint: You may need to use the link and read the description of how a cassette stores data.)
  • A printed schedule of the agenda for a full-day business meeting
  • A PDF with biographies of the presenters at a full-day business meeting
  • A recording on a smartphone of a song played live by a solo guitarist
  • A downloaded recording of a song played in a studio by a solo guitarist

2.
Question 2
Which of the following are benefits of organizing data? Check all that apply.

1 point

  • Copying the entire data contents to another storage area
  • Deleting all the data
  • Easier lookups for particular information within the data
  • Choosing a random piece of data
  • Faster counting of records that fits a particular category

3.
Question 3
Why is the DML (Data Manipulation Language) category needed in a good database system?

1 point

  • Displaying data
  • Collecting data
  • Organizing places for different types of data
  • Keeping the current values in a database system up to date

4.
Question 4
Which list of SQL statements below are DQL (Data Query Language) or Data Retrieval statements?

1 point

  • SELECT
  • CREATE, ALTER, DROP
  • INSERT, UPDATE, DELETE
  • GRANT, REVOKE

5.
Question 5
Which list of SQL statements below are DDL (Data Definition Language) statements?

1 point

  • CREATE, ALTER, DROP
  • INSERT, UPDATE, DELETE
  • SELECT
  • GRANT, REVOKE

6.
Question 6
Why have relational databases and SQL been so successful for the last 35 years or more? Choose two.

1 point

  • The mathematical rigor that spawned the beginning of RDBMSs supplied a strong, robust foundation for many different database systems and applications.
  • SQL—so closely tied to relational databases—is easy to learn and use, and so we have had an explosion of analysts, programmers and application tools that can use SQL.
  • SQL is isolated from other languages for greater protection from hackers and other privacy threats
  • The details of how SQL is implemented within a program is strongly tied to how the user will use SQL and evident in the user experience, providing a vast array of dialects that can be tailored to an industry’s needs.

7.
Question 7
Which of the following applications would best be supported by an operational database? Check all that apply.

1 point

  • Reports on rentals and movie ratings made by all customers of a movie rental business over the last five years.
  • A school enrollment program, scheduling which students go in which sections of which classes.
  • A government census, taken once every 10 years.
  • A bicycle assembly plant, identifying assembly parts that need to be ordered to replenish supplies as bicycles are produced.

8.
Question 8
Which are true statements about how operational and analytic database systems are different? Check all that apply.

1 point

  • Operational databases are more likely to receive frequent DML commands than analytic databases are.
  • Operational databases are more commonly used to discover how operations within a company can be improved based on past performances than analytic database are.
  • Operational databases tend to store more data than analytic databases do.
  • Operational databases are more likely to receive frequent lookup or search commands than analytic databases are.

Relational Databases

1.
Question 1
Consider this data:

Student ID Name Grade Level GPA
930 Olufunmilayo Ayton 11 4.00
667 Vincent Michaelson 10 2.53
907 Asa Quigg 10 3.57
168 Kiran Patil 11 3.28
Which of these tables would accept this data? Check all that apply.

(Note: This isn’t asking which are good table definitions; it’s only asking which would accept the data for storage.)

1 point


2.
Question 2
Here is a table definition:

Column Data Type Notes
student_id STRING PK
name STRING NOT NULL
grade_level STRING
gpa DECIMAL(2,1)
Which rows can be stored in this data? Choose all that apply.

1 point

{student_id : ‘392’, name : ‘Kamalani Hale’, gpa : 4.0}

{name : ‘Sandalio Abascal’, grade_level : ’10’, gpa : 3.2}

{student_id : ’93’, name : ‘Tilly Sokolowski’, grade_level : ‘New Student’}

{student_id : ‘732’, name : ‘Sanjiv Chaudhari’, gpa : ‘3.9’}

{name : ‘Qiu Yuen’}

3.
Question 3
What is database normalization?

1 point

  • Using well known table names and column names for common sorts of records, like customers, stores, and items.
  • Designing the tables in your relational database so that redundant storage is minimized and the chance of inconsistencies in the data is also reduced.
  • Combining data from different tables into one larger table, so that records from different tables don’t need to be joined together to give complete reports.
  • Tidying-up the data so that bad records, records with important values missing, and erroneous outliers are removed.

4.
Question 4
Which of the following rules are well-known conditions that help define third normal form? (Note, we are stating the rules a bit informally.) Choose all that apply.

1 point

  • A table must always hold all known information about a key.
  • Every table in your database must have a primary key.
  • There must be no repeating groups in any table. For example, you will not have a column that can contain one or more phone numbers.
  • You must store everything as efficiently as possible.
  • The non-key columns of a table must be dependent on the key only. For example, if you have an employee table with employee id as the key, then you might have a department id column for the employee, but not department name also (because the department name would be dependent on the department id, which is not your table’s primary key).

5.
Question 5
Which of the following are costs of normalization? Choose all that apply.

1 point

  • Normalizing a database design generally will make your queries run less efficiently.
  • Normalizing a database requires more complex queries on your data to answer many questions.
  • Normalizing a database can degrade the integrity of your data.
  • Normalizing a database design generally will make the total storage of your data smaller.

6.
Question 6
Why might you find it helpful to denormalize your database design? Choose all that apply.

1 point

  • When using an operational database, database denormalization keeps important information about a key in one row so it’s easier to maintain accuracy.
  • If your company is approaching its maximum storage capacity, and obtaining more is not an option for the near future, denormalizing allows you to reduce the storage needs in the short term.
  • Denormalizing will “pre-join” your previously normalized tables and store them that way, so fewer joins are needed in your queries.
  • In a system where join processing is slower, denormalizing can improve the runtime speed of many queries and reports.
  • If you frequently query some summary data, like store daily sales totals, keeping a summary table reduces the need to recompute summaries.

7.
Question 7
Which of these accurately describe why features of operational databases are not needed for analytic databases?

1 point

  • Analytic databases update infrequently so ETL (extract, transform, and load) utilities can replace many of the DML features of operational databases.
  • Analytic databases are used for complex queries; since triggers cause queries to run much more slowly, they should be handled in a different way.
  • Analytic databases are more focused on CRUD type activities
  • Analytic databases only handle very simple data, so there are more efficient methods to correct inaccurate data than enforcing business rules.
  • Analytic databases often use data collected from other sources (including other operational databases), so enforcing business rules is typically not needed.

Big Data

1.
Question 1
Suppose you want to store a petabyte of data, and you want to run a report that requires reading and processing 250 terabytes of that data. What is a key difference in the technology you’ll use for this, versus a need to store and process one or two megabytes of data.

1 point

  • Extremely powerful computer processors
  • Cost
  • Distributed storage and processing
  • Cloud storage

2.
Question 2
The following are records in a contact list.

{‘name’;’Étienne’, ’email’;’etienne@example.com’, ‘mobile’;’555-8372′}

{‘name’;’Brayden’, ‘home’;’555-2202′, ‘work’;’555-2800′}

{‘name’;’Diana’, ‘mobile’;’555-6575′, ’email’;’dprince@example.com’}

Is this contact list an example of structured, semi-structured, or unstructured data?

1 point

  • Structured
  • Semi-structured
  • Unstructured

3.
Question 3
Is an online credit card or bank account statement an example of structured, semi-structured, or unstructured data?

1 point

  • Structured
  • Semi-structured
  • Unstructured

4.
Question 4
You plan to gather data from various sources. Which of the following sources do you think will definitely give you structured data?

1 point

  • Tables you capture from another relational database system
  • A set of XML documents delivered from a public data source
  • A news article downloaded from the web
  • A survey in which every question is a rating from 1 to 5
  • A CSV (comma separated value) file taken from a spreadsheet
  • A collection of photographs taken with a smartphone

5.
Question 5
Which of the following describe a reason why RDBMSs are a poor choice for big data? Check all that apply.

1 point

  • Because RDBMSs verify data on write, each row must be INSERTed separately, which is prohibitive when you have millions of rows
  • Because a large amount of unstructured data would need to be stored as a BLOB or CLOB, RDBMSs provide little to no support for working with such data.
  • The structured nature of RDBMSs imposes costs in terms of storage and processing, which becomes prohibitive with really big amounts of data.

6.
Question 6
Look at the following data:

id name grade_level gpa age
930 Olufunmilayo Ayton 11 4.00 16
667 Vincent Michaelson 10 2.53 15
907 Asa Quigg 10 3.57
168 Kiran Patil 11 3.28 17
Now imagine that, instead of four rows, you have 4000 rows, and all are similar to the rows you see here. Which of the following questions can you answer from this data? Check all that apply.

1 point

  • What is the home address of a student with id ‘930’?
  • What is the number of students in the table?
  • What are the names of all the students at the same grade level as Kiran Patil?
  • What is the highest allowable value of age?
  • What is the number of students in each grade level?

7.
Question 7
Consider the following data (in this case, a list of JSON objects):

{‘shop’:’Dicey’, ‘game’:’Monopoly’, ‘qty’:7, ‘aisle’:3, ‘price’:17.99}

{‘shop’:’Dicey’, ‘game’:’Clue’, ‘qty’:3, ‘price’:9.99}

{‘shop’:’Board Em’, ‘game’:’Monopoly’, ‘qty’:11, ‘aisle’:2, ‘price’:25.00}

{‘shop’:’Board Em’, ‘game’:’Candy Land’, ‘qty’:4, ‘aisle’:2}

{‘shop’:’Board Em’, ‘game’:’Risk’, ‘qty’:7, ‘aisle’:3, ‘price’:35.00}

{‘shop’:’Board Em’, ‘game’:’Stratego’, ‘qty’:’low stock’}

Which of the following questions can you definitely answer from this data? (Hint: Take note of missing values and inconsistent data types, which would make the answers unknown or uncertain.)

1 point

  • Which games are in aisle 3 at the Dicey shop?
  • What is the price of Risk at the Board Em shop?
  • What is the average quantity of all the games in all the stores?
  • Which shop has a higher average price for its games?
  • What are the games that start with the letter C?

8.
Question 8
Which of the following questions could be answered quickly and easily by treating the complete plays of Shakespeare as data, separated by title and type (tragedy, comedy, or history)? Check two answers.

1 point

  • Which of the plays are considered the most important?
  • Which of the tragedies includes, or mentions, someone named Lucilius?
  • How many plays are histories?
  • How many people are mentioned or appear in the plays?

SQL Tools for Big Data Analysis

1.
Question 1
You need a database system for a company with a massive physical warehouse operation. They want to keep track of their inventory, recording every shipment in and every shipment, with well-defined descriptions of each item and where they are stored. They also want to be able to analyze their operations to answer questions such as whether certain items should be stored closer together, or how often a particular item sells out. Which of the following would be the best choice for this?

1 point

  • An ACID-compliant RDBMS for big data, such as Splice Machine or Apache Phoenix
  • A search system, such as Cloudera Search or Elasticsearch
  • A non-transactional operational system designed for structured data, such as Apache Kudu
  • A non-transactional operational system designed for unstructured or semi-structured data, such as Apache HBase or MongoDB
  • An analytic system (data warehouse) such as Apache Impala

2.
Question 2
You need a database system for a library of millions of large text documents, to help users find the documents that contain the information they need. Which of the following would be the best choice for this?

1 point

  • An analytic system (data warehouse) such as Apache Impala
  • A search system, such as Cloudera Search or Elasticsearch
  • An ACID-compliant RDBMS for big data, such as Splice Machine or Apache Phoenix
  • A non-transactional operational system designed for structured data, such as Apache Kudu
  • A non-transactional operational system designed for unstructured or semi-structured data, such as Apache HBase or MongoDB

3.
Question 3
Which of the following are features of SQL on RDBMSs that are also kept for working with big data systems? Check all that apply.

1 point

  • Unique values within columns
  • Synchronized indexes
  • SELECT statements
  • Seeing data as tables with column names
  • Support for many file formats

4.
Question 4
Which of the following is the reason why we lose many features of SQL when moving from traditional RDBMSs to big data systems?

1 point

  • Many of the lost features do not work well with the variety of data available in big data stores
  • Many of the lost features are rarely used and implementation has been a low priority for big data systems
  • Many of the lost features are useful for relatively small amounts of data, but they become irrelevant for large volumes of data
  • Many of the lost features require transactions, which are notoriously challenging for big data systems and not typically implemented

5.
Question 5
Which of the following are features of SQL for working with traditional RDBMSs that we lose when moving to working with big data systems? Check all that apply.

1 point

  • Foreign key constraints
  • GRANT and REVOKE statements
  • Seeing data as tables with column names
  • Support for many file formats
  • Database triggers and stored procedures
  • Complex data types

6.
Question 6
Which of the following are features of SQL for working with big data systems that are not typically found in SQL for traditional RDBMSs? Check all that apply.

1 point

  • CREATE and ALTER statements
  • UPDATE and DELETE statements
  • GRANT and REVOKE statements
  • Primary key constraints
  • Complex data types

7.
Question 7
A company has a small on-premises cluster that they are rapidly outgrowing, and they are considering switching to cloud storage, or maintaining a hybrid solution. The following describes some factors going into their decision. Which are reasons that potentially support using a cloud cluster rather than an on-premises cluster? (Note that a hybrid option might still be best!)

1 point

  • The company expects to continue to maintain and expand their data store for several years.
  • The company hopes some upcoming new products will drastically increase their storage needs, though their processing needs probably will increase less dramatically
  • The company’s analytics team processes queries nearly constantly, some ad-hoc during business hours and some bulk processes that run every day during off-hours; they do not experience significant periods of inactivity
  • The company’s assets and budgets afford extra room for operating expenditures, but they are trying to keep capital expenditures to a minimum

8.
Question 8
Which of the following accurately describes how the data dictionary in a traditional RDBMS being tightly coupled to the data is different from the table definitions in a big data system being loosely coupled to the files? Check all that apply.

1 point

  • The contents of the database are compressed for optimal storage space in a RDBMS, while the files in a big data system can be compressed if desired using the file format settings
  • Each table in a RDBMS has its own data dictionary, so they come in pairs, while a table definition in a big data system can be applied to different data files (a “one-to-many” coupling)
  • The contents of the data dictionary accurately describe every table in a RDBMS, while the table definitions in a big data system describe what is expected in some files, but even those files may not match exactly
  • The data dictionary governs what is stored as data in a RDBMS, while the files in a big data system are completely ungoverned

Peer-graded Assignment: Database Overview

Click Here  To Download

 

Leave a Comment