Which of the following are sources of data that can be used for machine learning
Understanding Data for ML >>> Which of the following are sources of data that can be used for machine learning >>> Introduction to Applied Machine Learning
1.
Question 1
Which of the following are sources of data that can be used for machine learning? (click all that apply)
1 point
Government data such as census results.
Personal data collected without permission
Data collected by a business about their own operations
Data collected by a business about their customers
Text data from the Internet, such as Amazon reviews or Wikipedia
Data handwritten in a notebook
Government archives
Data purchased from third party data “brokers”
========================================
2.
Question 2
Which of the following are issues of ethics and responsibility in machine learning? (click all that apply)
1 point
The security of the data, so that it isn’t easily lost or stolen
The proper consent of the original owners of the data
The anonymization of the data, as much as is possible
The representativeness of the data
The fair treatment of the people collecting and processing the data
========================================
3.
Question 3
How can data be biased? (click all that apply)
1 point
It can’t; data is data and it reflects the real world
It might include data collected under different conditions, and so not reflect operational data
It might not include enough training data on a range of gender and ethnic groups, and so not reflect operational data
It might not include data from underrepresented socioeconomic groups, and so not reflect real-world data
========================================
4.
Question 4
What is the batch effect?
1 point
When data from different sources have variations that aren’t meaningful, but the algorithm takes as meaningful
When you train your QuAM several times in different batches
When hospitals don’t have the same scan results
When data from different times have included measurements of different things
========================================
5.
Question 5
Which of the following statements are true about data and data pipelines?
1 point
Learning data and operational data need to be in the same format
Long term data storage is never a concern
Automating data retrieval is a straight-forward process
Transformed data will need to be accessible to your QuAM
Machine learning is an ongoing process, so new, incoming data is important
Features that were used in the learning data must be present in operational data
Integrating data from multiple sources can cause formatting issues