A Crash Course in Data Science Quiz Answer. In this post you will get Quiz Answer Of A Crash Course in Data Science
A Crash Course in Data Science Quiz
Offered By ”Johns Hopkins University”
What is data science?
1.
Question 1
Data science is
1 point
- Database management
- Applied machine learning
- Applied statistics
- Deep learning
- Answering specific questions with data
What is statistics good for?
1.
Question 1
We covered four example broad areas of statistics. These were (check all that apply):
1 point
- Gut instincts
- Descriptive
- Prediction
- Inference
- Experimental design
2.
Question 2
Descriptive analysis includes which activities (check all that apply)?
1 point
- Sample size calculations
- Exploratory data analysis
- Basic summary tables
3.
Question 3
Statistical inference is defined as:
1 point
- The process of adding randomization to an experimental design.
- The process of performing unsupervised clustering.
- The process of evaluating predictions using cross validation.
- The process of drawing conclusions about populations from a sample.
4.
Question 4
Predictions are typically evaluated by:
1 point
- A measure of prediction performance.
- Whether randomization was included in the design.
- Model simplicity.
5.
Question 5
Randomization of a treatment in a design is used for:
1 point
- Balancing observed and unobserved covariates that may contaminate our results.
- Obtaining good predictions.
Machine learning
1.
Question 1
The lecture discussed two broad categories of machine learning (check all that apply):
1 point
Support vector machines
Unsupervised learning
Supervised learning
2.
Question 2
Supervised machine learning algorithms focus on:
1 point
clustering without an outcome.
prediction through prediction performance.
principal components.
3.
Question 3
A way to obtain generalizability of a ML algorithm
1 point
use the same data for testing that was used to build the algorithm
test it on novel datasets
4.
Question 4
Traditional statistical approaches often differ from ML approaches by (check all that apply):
1 point
by focusing on superpopulation models.
by often placing a higher priority on parameter interpretability and simplicity over prediction performance.
by focusing on deep learning.
Quiz: Software Engineering
1.
Question 1
What role does software engineering play in data science?
1 point
Software engineering is used to increase the speed of machine learning algorithms.
Software engineering is used to generalize data analyses into software so that they can be applied in different situations.
Software engineering’s role is to build computing infrastructure to support complex data analyses
2.
Question 2
Which is a benefit of building software packages for data analysis?
1 point
Software packages are always smaller than regular code files.
Software provides a well-defined interface that can abstract low-level technical details of data analysis routines.
Software packages are generally faster than simple code.
3.
Question 3
When should you consider developing a software package? Select all that apply.
1 point
When an analysis or a part of an analysis must be done more than once or twice.
Any time you analyze data.
If members of another team/group wish to apply your same analysis to their own datasets.
Structure of a Data Science Project
1.
Question 1
What are the two stages in which a data science project might start? Select all that apply.
1 point
Interpretation
Report writing
Defining/stating the question
Exploratory data analysis
2.
Question 2
Which part is NOT part of the data analysis process?
1 point
Exploratory data analysis
Decision-making
Formal modeling
Communication
3.
Question 3
What are the two goals of exploratory data analysis? Select all that apply.
1 point
Determine if the data are suitable for the question
Sketch an answer to your question.
Assess the totality of the evidence regarding your question.
Build presentations for communicating results to people outside your organization.
4.
Question 4
An analyst on your team engages in exploratory data analysis of a dataset. The EDA inspires him to ask a new question about the data so he begins the data analysis process on this same dataset and goes through the 5 phases.
What is wrong with this approach?
1 point
The development of the question and the development of the answer to the question were conducted with the same dataset.
Exploratory data analysis should never come before defining the question
Exploratory data analysis was used to generate a new question.
The outputs of a data science experiment
1.
Question 1
The outputs of a data science experiment often include (check all that apply):
1 point
Interactive web pages and apps
Presentations
Reports
2.
Question 2
Reproducibility tools for reports like knitr help with (check all that apply):
1 point
Reproducibility
Documenting the analysis
Getting the data scientist to think about the report during analysis
Version control
3.
Question 3
For maintainability of an data science app the following are useful (check all that apply):
1 point
Version control
Good code documentation
4.
Question 4
Example tools for reproducible report writing are (check all that apply):
1 point
dplyr
knitr
ipython notebooks
5.
Question 5
A good report practices is:
1 point
Being clear written with concise conclusions
Document every blind alley and bit of minutiae from the analysis
To cram as much detail in as possible
Defining Success in Data Science
1.
Question 1
Some ways we can declare success in data science include (check all that apply):
1 point
Decisions are made based on the data analysis.
The results of the analysis are uncertain and conclusions are not clear.
New knowledge about the phenomena under study is created.
2.
Question 2
Learning that the data in question can’t answer the question being posed is a useful result of a data science experiment
1 point
Not true
True
3.
Question 3
Data products and apps are useful for creating impact of a data science experiment
1 point
True
False
4.
Question 4
A negative outcome from a data science experiment would include
1 point
The data is ignored despite having clear evidence.
New knowledge is created.
A high impact app is made.
Policy is enacted based on new data.
Data scientist toolbox
1.
Question 1
What are some examples of languages designed for data analysis?
1 point
Scalable computing infrastructure
The Python programming language
Literate programming tools
The Postgres programming language
The MongoDB programming language
2.
Question 2
Why are chat tools like Slack part of the data scientist’s toolbox?
1 point
Chat tools like Slack are good for communicating results to a broad audience
Chat tools are good for downtime between long focused periods working on data science projects.
Data science tools are constantly updating, so keeping in touch with your data science colleagues is essential for success
General purpose tools for chatting are not part of the data scientist’s toolbox.
Data scientist’s aren’t typically good at communication, so a chat tool lets introverts work with others.
3.
Question 3
Which of the following is not a tool in the data scientist’s toolbox?
1 point
Data programming languages like R
Chat tools like Slack
Databases like MongoDB
Data journalism websites like FiveThirtyEight
Help websites like Stackoverflow
4.
Question 4
A data scientist must know how to pull data from every database.
1 point
TRUE
FALSE
Separating hype from value
1.
Question 1
Joe proposes a data science project applying neural networks to all the data stored in her companies internet logs. Why might this project be hype?
1 point
The project isn’t designed to answer a concrete question.
There may be outliers, since some of the companies’ users are power users.
Neural networks are known not to work on databases
Internet log files are notoriously messy and hard to analyze
2.
Question 2
This is an interesting article about the end of theory due to data collection:
http://archive.wired.com/science/discoveries/magazine/16-07/pb_theory/
One quote from the story is:
“There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.”
Which hype vs. reality question does this paragraph fail?
1 point
What is the question you are trying to answer with data?
If you could answer the question, could you use the answer?
Do you have the data to answer that question?
3.
Question 3
The Netflix prize offered participants $1 million to improve Netflix’s algorithm by a specified amount. Several teams did. This is an interesting article on why the Netflix prize solution was never implemented.
http://www.wired.com/2012/04/netflix-prize-costs/
Which of the hype versus reality questions did this project fail?
1 point
Do you have the data to answer that question?
What is the question you are trying to answer with data?
If you could answer the question, could you use the answer?
4.
Question 4
This article describes some problems with the Google Flu trends algorithm:
https://www.newscientist.com/article/dn25217-google-flu-trends-gets-it-wrong-three-years-running/
Which of the hype versus reality questions did Google Flu trends fail?
1 point
If you could answer the question, could you use the answer?
What is the question you are trying to answer with data?
Do you have the data to answer that question?