# Text Mining and Analytics Quiz

Enroll Now

## Orientation Quiz

1.
Question 1
This course lasts for ___ weeks.

1 point

• 3
• 6
• 2
• 4

=================================================

2.
Question 2
I am required to read a textbook for this course.

1 point

• False
• True

=================================================

3.
Question 3
Which of the following activities are required each week? Check all that apply.

1 point

• Programming Assignments
• Quizzes
• Forum Assignments

=================================================

4.
Question 4
The following tools will help me use the discussion forums:

1 point

• “Up-voting” posts that are thoughtful, interesting, or helpful.
• “Searching” for your question or a topic before you post a new thread.
• “Following” any forums that are particularly interesting to me.

=================================================

5.
Question 5
If I have a problem in the course I should:

1 point

• Email the instructor
• Drop the class
• Report it to the Learner Help Center (if the problem is technical) or to the Content Issues forum (if the problem is an error in the course materials).
• Call the instructor

## Week 1 Quiz

1.
Question 1
True or false? A paradigmatic relation is a relation between two words that tend to co-occur with each other, while a syntagmatic relation is between two words that tend to occur in a similar context.

1 point

False

True

=================================================

2.
Question 2
In a collection of English news articles, which word do you expect to have a higher IDF?

1 point

“the”

“learning”

=================================================

3.
Question 3
Suppose the pseudo-document representations for the contexts of the terms A and B in the vector space model are given as follows:

dA = (0.10, 0.50, 0.00, 0.40, 0.00, 0.00)

dB = (0.20, 0.40, 0.30, 0.00, 0.10, 0.00)

What is the EOWC similarity score?

1 point

1

0.02

0.20

0.22

=================================================

4.
Question 4
True or false? Syntactic analysis (parsing) is an easier task than lexical analysis (part-of-speech tagging).

1 point

False

True

=================================================

5.
Question 5
“A man saw a boy with a telescope.” What kind of ambiguity does the sentence have?

1 point

Word-level ambiguity

Syntactic ambiguity

=================================================

6.
Question 6
In an online text mining application where response time is the key factor to consider, what kind of NLP features can be used? Check all that apply.

1 point

POS-tagging

Word tokenization

Syntactic parsing

Relation extraction

=================================================

7.
Question 7
True or false? Deeper NLP requires more human effort and usually is less accurate.

1 point

False

True

=================================================

8.
Question 8
True or false? Word-based representation is not powerful.

1 point

False

True

=================================================

9.
Question 9
Which of the following is correct about paradigmatic and syntagmatic words relations?

1 point

Monday, Tuesday are words of paradigmatic relation.

Syntagmatic related words have high context similarity.

Paradigmatic related words have high co-occurrence.

=================================================

10.
Question 10
Why does EOWC not work well?

1 point

It favors matching rare words.

It treats words unequally.

It favors matching frequent terms.

## Week 2 Quiz

1.
Question 1
You are given a unigram language model \thetaθ distributed over a vocabulary set VV composed of only 4 words: “the”, “global”, “warming”, and “effects”. The distribution of \thetaθ is given in the table below:

w P(w|θ)
the 0.3
global 0.2
warming 0.2
effects X
What is X, i.e., P(\text{“effects”}|\theta)P(“effects”∣θ) ?

1 point

0.1

0.2

0.3

0

=================================================

2.
Question 2
Assume you are given the same unigram language model as in Question 1. Which of the following is not true?

1 point

P(\text{“global warming”}|\theta) > P(\text{“warming global”}|\theta)P(“global warming”∣θ)>P(“warming global”∣θ)

P(\text{“global warming”}|\theta) = 0.04 P(“global warming”∣θ)=0.04

P(\text{“the global warming effects”}|\theta) \lt P(\text{“global warming effects”}|\theta) P(“the global warming effects”∣θ)<P(“global warming effects”∣θ)

P(\text{“text mining”}|\theta) = 0 P(“text mining”∣θ)=0

=================================================

3.
Question 3
Assume that words are being generated by a mixture of two unigram language models, \theta_1θ
1

and \theta_2θ
2

, where P(\theta_1)=0.5P(θ
1

)=0.5 and P(\theta_2)=0.5P(θ
2

)=0.5. The distributions of the two models are given in the table below:

ww P(w|\theta_1)P(w∣θ
1

) P(w|\theta_2)P(w∣θ
2

)
sports 0.35 0.05
fast 0.3 0.3
computer 0.1 0.4
smartphone 0.05 0.2
Then the probability of observing “computer” from this mixture model is: P(\text{“computer”}) =P(“computer”)=

1 point

0.45

0.4

0.05

0.25

=================================================

4.
Question 4
Assume the same given as in Question 3. We now want to infer which of the two word distributions, \theta_1θ
1

and \theta_2θ
2

, has been used to generate “computer”, and would thus like to compute the probability that it has been generated using \theta_1θ
1

and \theta_2θ
2

, i.e., P(\theta_1|\text{“computer”})P(θ
1

∣“computer”) and P(\theta_2|\text{“computer”})P(θ
2

∣“computer”), respectively, then the values of P(\theta_1|\text{“computer”})P(θ
1

∣“computer”) and P(\theta_2|\text{“computer”})P(θ
2

∣“computer”) are:

Hint: Apply Bayes rule.

1 point

0.9 and 0.1

0.1 and 0.9

0.8 and 0.2

0.2 and 0.8

=================================================

5.
Question 5
Suppose words are being generated using a mixture of two unigram language models \theta_1θ
1

and \theta_2θ
2

. Let P(w)P(w) denote the probability of generating a word ww from this mixture model.

If P(\theta_1) =1P(θ
1

)=1 then which of the following statements is true?

1 point

P(w| \theta_2) = 0P(w∣θ 2)=0, for any word w

P(w) = P(w| \theta_1)P(w)=P(w∣θ 1 ), for any word w

P(w | \theta_1) = 0P(w∣θ 1 )=0, for any word w

=================================================

6.
Question 6
True or false? Let X_{text}X
text

, X_{mining}X
mining

, and X_{the}X
the

be binary random variables associated with the words “text”, “mining”, and “the”, respectively. Assume that the probabilities of the random variables are estimated based on a large corpus. Then we should expect H(X_{text}|X_{mining}) > H(X_{text}|X_{the})H(X
text

∣X
mining

)>H(X
text

∣X
the

).

1 point

False

True

=================================================

7.
Question 7
True or false? I(X;Y)=0 if and only if X and Y are independent.

1 point

False

True

=================================================

8.
Question 8
Let w be a word and X_wX
w

be a binary random variable that indicates whether w appears in a text document in the corpus. Assume that the probability P(X_w=1)P(X
w

=1) is estimated by Count(w)/N, where Count(w) is the number of documents w appears in and N is the total number of documents in the corpus.

You are given that “the” is a very frequent word that appears in 99% of the documents and that “photon” is a very rare word that occurs in 1% of the documents. Which word has a higher entropy?

1 point

“photon”

“the”

Both words have the same entropy.

=================================================

9.
Question 9
Let X be a binary random variable. Which of the following is not true? Select all that apply.

1 point

If P(X=0)=1, then H(X) = 1

H(X) ≤ 1

If P(X=1)=1, then H(X) = 1

If P(X=0)=1, then H(X) = 0

=================================================

10.
Question 10
True or false? An unbiased coin has a higher entropy than any biased coin.

1 point

True

False

## Quiz: Week 3 Quiz

1.
Question 1
You are given two unigram language models \theta_1θ
1

and \theta_2θ
2

as defined in the table below:

w P(w|θ1)
P(w|θ2)
concert 0.1 0.4
music 0.1 0.4
data 0.4 0.1
software 0.4 0.1

Suppose we are using a mixture model for document clustering based on the two given unigram language models, \theta_1θ
1

and \theta_2θ
2

, such that P(\theta_1θ
1

)=0.5 and P(\theta_2θ
2

)=0.5. To generate a document, first, one of the two language models is chosen according to P(\theta_iθ
i

), and then all the words in the document are generated based on the chosen language model.

The probability of generating the document d: “music software” using the given mixture model is P(“music software”)=

1 point

0.05

0.04

0.5

0.6

=================================================

2.
Question 2
Assume the same unigram language models, θ1 and θ2, defined as in the table of Question 1 with P(θ1)=0.5 and P(θ2)=0.5.

We now want to generate documents based on the mixture model used in topic modeling. To generate a document for each word, we first choose one of the two language models, θ1 and θ2, and then generate the word according to the chosen model. The probability of generating the document d: “music software” according to this mixture model is P(“music software”)=

1 point

0.125

0.0125

0.0625

0.625

=================================================

3.
Question 3
Let X_wX
w

be a random variable denoting whether word w occurs in a text document in a collection of English news articles. Which random variable do you expect to have a lower entropy?

1 point

H(X_{learning})H(X learning)

H(X_{the})H(X the )

=================================================

4.
Question 4
We want to run PLSA on a collection of N documents with a fixed number of topics k where the vocabulary size is M. What is the number of parameters that PLSA tries to estimate? Consider each P(w|θj) or πd,j as a separate parameter.

1 point

MNk

Mk+Nk

Nk

Mk

=================================================

5.
Question 5
You are given a document dd that contains only two words: “the” and “machine”. Assume that this document was generated from a mixture of two unigram language models: a known background language model \theta_Bθ
B

and an unknown topic language model \theta_dθ
d

. Let P(\theta_B) = \lambdaP(θ
B

)=λ and P(\theta_d) = 1-\lambdaP(θ
d

)=1−λ and assume that P(\text{“the”}|\theta_B) = 0.9P(“the”∣θ
B

)=0.9 and P(\text{“machine”}|\theta_B) = 0.1P(“machine”∣θ
B

)=0.1. We want to estimate \theta_dθ
d

using maximum likelihood. Then, as \lambdaλ increases, P(\text{“machine”}|\theta_d)P(“machine”∣θ
d

) will:

Hint: First get the maximum likelihood estimates of the two words in \theta_dθ
d

(refer to the lecture on “Probabilistic Topic Models: Mixture Model Estimation”). Then, write P(\text{“machine”}|\theta_d)P(“machine”∣θ
d

) as a function of \lambdaλ and study the behavior of the function.

1 point

Increase

Remain the same

Decrease

=================================================

6.
Question 6
True or false? In general, PLSA using the EM algorithm does not stop until it achieves the global maximum of the likelihood function.

1 point

True

False

=================================================

7.
Question 7
True or false? Let \theta_1θ
1

,…,\theta_kθ
k

be the k unigram language model’s output by PLSA and V be the vocabulary set. Then, for any i∈{1,…,k}, the following relation always holds: ∑w∈VP(w|\theta_iθ
i

)=1.

1 point

False

True

=================================================

8.
Question 8
True or false? The EM algorithm cannot decrease the likelihood of the data.

1 point

True

False

=================================================

9.
Question 9
True or false? Assume that the likelihood function of PLSA has multiple local maxima and one global maximum. There exists an initial set of parameters for which PLSA will converge to the global maximum of the likelihood function.

1 point

True

False

=================================================

10.
Question 10
True or false? When using PLSA to mine topics from a text collection, the number of parameters of the PLSA model stays the same as we keep adding new documents into the text collection assuming that the new documents do not introduce new words that have not occurred in the current text collection.

1 point

True

False

## Week 4 Quiz

1.
Question 1
What is NOT the motivation for text clustering?

1 point

To quickly get an idea about a large collection of documents

To remove spam documents based on a small collection of human annotated spam documents

To link similar documents and remove duplicated documents

To create structure of text data

=================================================

2.
Question 2
What is TRUE about the mixture model and topic modeling?

1 point

Topic modeling can also be used for document clustering directly.

In topic modeling, the topic of each word is independently sampled, while in the mixture model, only one topic is drawn for each document.

Only topic modeling can learn topics, while the mixture model does not yield such information after learning.

=================================================

3.
Question 3
In the mixture model, if we want to encourage the formation of a large cluster:

1 point

Try different initialization

Add prior to P(\theta)P(θ) so that the distribution is skewed

Use a smaller number of clusters for training

=================================================

4.
Question 4
In the EM algorithm, which step improves the model likelihood?

1 point

M-step

E-step

=================================================

5.
Question 5
True or false? In the EM algorithm, the model likelihood monotonically increases.

1 point

False

True

=================================================

6.
Question 6
What is the most difficult part of directly applying maximal likelihood to PLSA?

1 point

The objective function needs to sum over all words for each document.

The objective function needs to sum over all topics for each word.

The objective function needs to sum over all documents in the collection.

=================================================

7.
Question 7
For the agglomerative clustering algorithm, which of the following is not TRUE?

1 point

The depth of the hierarchy is always log_2(N)log
2

(N) where N is the number of items.

It’s a bottom-up algorithm to form a hierarchy.

The user needs to specify a similarity measurement.

=================================================

8.
Question 8
Which evaluation method is best for clustering results of a large collection of documents?

1 point

Use the indirect evaluation method and test performance for an application with or without clustering.

Use the direct evaluation method and create human annotations for each document in the collection.

=================================================

9.
Question 9
Which of the following is NOT sensitive to outliers?

1 point

=================================================

10.
Question 10
Which of the following is a generative classification algorithm?

1 point

Logistic Regression

SVM

K-NN

Naive Bayes

## Week 5 Quiz

1.
Question 1
Assume that documents are being classified into two categories, c1 and c2, such that a document can belong to more than one category. The table below shows the prediction of a classifier, denoted by “y” or “n”, in addition to the true label (ground truth) represented by a “+” or “-”, where a correct prediction is either y (+) or n (-).

c1 c2
D1 y(+) y(+)
D2 n(-) y(+)
D3 n(+) n(-)
D4 y(-) y(+)
D5 n(+) n(-)

Let P(ci) and R(ci) denote the precision and recall associated with category ci, respectively.

The precision and recall of c1 and c2 are:

1 point

P(c1) = 1/2 R(c1) = 1/2 P(c2) = 1 R(c2) = 1

P(c1) = 1/2 R(c1) = 1/2 P(c2) = 1/2 R(c2) = 1/2

P(c1) = 1/2 R(c1) = 1/3 P(c2) = 1 R(c2) = 1

P(c1) = 1/3 R(c1) = 1/2 P(c2) = 1 R(c2) = 1

=================================================

2.
Question 2
Given the same data as in Question 1, the classification accuracy of the classifier is:

1 point

9/10

8/10

3/10

7/10

=================================================

3.
Question 3
Given the same data as in Question 1, what is the recall of the classifier using micro-averaging (i.e., by pooling all decisions together)?

1 point

1

4/5

5/6

2/3

=================================================

4.
Question 4
Suppose we are performing document clustering on a collection of N documents using a mixture model as discussed in the lecture Text Clustering: Generative Probabilistic Models (Part 3). Let the number of clusters be K and the vocabulary size be M. What is the number of parameters that the EM algorithm tries to estimate? Consider each P(θi) or P(w|θi) as a separate parameter.

1 point

MNK

KN+MK

K+MK

MK

=================================================

5.
Question 5
Which one of the following statements is not an opinion?

1 point

PLSA is the best method for a topic mining task.

PLSA always performs similarly to LDA.

PLSA is a mixture model.

=================================================

6.
Question 6
True or false? Word unigrams are the best performing features for sentiment classification.

1 point

True

False

=================================================

7.
Question 7
True or false? Suppose we are using logistic regression for binary classification (i.e., k=2) where the number of features is M. Then, the number of parameters to be estimated is M+1.

1 point

False

True

=================================================

8.
Question 8
True or false? Assume we are using word n-grams as features to perform sentiment classification. Then, higher values of n will usually be less prone to overfitting (i.e., for higher values of n, the difference between training and testing accuracies will be smaller).

1 point

True

False

=================================================

9.
Question 9
Why is accuracy sometimes not good for classification evaluation? Check all that apply.

1 point

Computation of accuracy is difficult.

For imbalanced dataset, high accuracy does not imply good performance.

Some decisions are more serious than others.

=================================================

10.
Question 10
If you want to put more emphasis on precision than recall, how should you adjust the value of \betaβ?

1 point

Choose a low value of \betaβ

Choose a high value of \betaβ

## Week 6 Quiz

1.
Question 1
Given a set of restaurant reviews along with the overall numeric rating of every restaurant, you are asked to infer the ratings of each of the restaurants on cleanliness, taste, and value. Which of the following methods is the most suitable to solve such an inference problem?

1 point

Sentiment analysis

Topic modeling

Contextual text mining

Latent Aspect Rating Analysis

=================================================

2.
Question 2
Examine the objective function of NetPLSA in the lecture entitled Contextual Text Mining: Mining Topics with Social Network Context. Increasing λ will:

1 point

Make neighbor nodes have less similar topic coverage

Not affect the topic coverage of neighbor nodes

Make neighbor nodes have more similar topic coverage

=================================================

3.
Question 3
You are given an undirected citation network composed of papers {p1,…,pn} as nodes, where a link between papers pi and pj means that one of the papers cited the other. Suppose you want to use the given data to discover the topics (research areas) of the papers. Which of the following methods is expected to work best?

Hint: Papers that have a citation relationship are more likely to belong to the same research area.

1 point

CPLSA

Sentiment analysis

NetPLSA

PLSA

=================================================

4.
Question 4
You are given a collection of news articles along with their publishing dates and want to reveal which topics have attracted increasing attention in a certain time period. Which of the following methods is most suitable for this task?

1 point

CPLSA

NetPLSA

Sentiment analysis

=================================================

5.
Question 5
Suppose we are performing Latent Aspect Rating Analysis where the number of aspect segments is K and the number of words in each aspect segment is M. What is the total number of parameters for term sentiment weights, i.e., the β values, that have to be estimated?

1 point

MK

M+K

M

K

=================================================

6.
Question 6
Which of the following is true?

1 point

Ordinal logistic regression trains k−1 independent classifiers, k being the number of classes.

Different types of features, such as POS tags and word n-grams, can be combined when performing sentiment analysis.

The objective function of NetPLSA does not try to make neighbor nodes have similar topic coverage.

=================================================

7.
Question 7
Imagine a company is interested in understanding any factors related to their fluctuating sales of a new product in the past year. They collected the companion text data including the consumer reviews of the product from multiple websites with time stamps in the past year and hope to gain potential insights from such text data. Which of the following text mining techniques would you recommend to them?

1 point

Iterative topic modeling with time series supervision

Text clustering

Contextual PLSA (CPLSA)

=================================================

8.
Question 8
The US government implemented a new health care policy in year 2010. Suppose the government is interested in understanding the impact of such a policy and how the policy has affected what people talk about in social media. For this purpose, we can collect social media text data such as forum posts and tweets with time stamps before 2010 and after 2010. Which of the following text mining techniques is most suitable for such a text mining task?

1 point

Iterative Topic Modeling with Time Series Supervision

Contextual PLSA (CPLSA)

Text clustering

=================================================

9.
Question 9
Context can be used to (check all that apply):

1 point

Partition text

Annotate topics

=================================================

10.
Question 10
Which of the following statement of CPLSA is NOT correct?

1 point

It enables contextual text mining.

The EM algorithm can be used for optimization.

CPLSA is an extension of PLSA.

It models the joint probability of text and context.

Other Questions Of This Category