You need to download RapidMiner software for this Question 1 What is the process

You need to download RapidMiner software for this
Question 1
What is the process

You need to download RapidMiner software for this
Question 1
What is the process of dealing with incorrect, corrupted, poorly formatted, duplicate, or incomplete data within a dataset?
data aligning
data granularity
data clustering
data cleaning
5 points
QUESTION 2
Which clustering technique is more appropriate for large datasets with numeric data?
hierarchical
k-means
n-dimensional
supervised
5 points
QUESTION 3
Which one is an example of an unsupervised technique?
association rules
linear regression
analysis of variance
neural network
5 points
QUESTION 4
Which one is an example of a supervised technique?
association rules
linear regression
analysis of variance
neural network
5 points
QUESTION 5
Assume that the correlation between pollution and temperature is 0.61. How can we interpret this result?
The higher the temperature the higher the pollution
The higher the temperature the lower the pollution
The lower the temperature the higher the pollution
Temperature and pollution are not correlated
5 points
QUESTION 6
Questions 6-10 are based on the file colleges.csv. Upload the file ‘colleges’ to RapidMiner. The file contains data about colleges and SAT scores. Run the model and explore the results. How many attributes have missing values?
2
3
4
5
5 points
QUESTION 7
In ‘colleges’, filter out examples with missing data and run the model. How many examples are now in the dataset?
525
777
956
1022
5 points
QUESTION 8
In ‘colleges’, explore the statistics. What is the average score for Math SAT in the dataset? Round up the results if necessary.
320
507
665
750
5 points
QUESTION 9
In ‘colleges’, explore the attribute State. Which state has the largest number of universities in the dataset?
CA
TX
IL
NY
5 points
QUESTION 10
In ‘colleges’, convert the attribute ‘Public(1)/Private(2)’ to nominal. How many private colleges are in the dataset?
250
320
447
527
Close the file.
5 points
QUESTION 11
Questions 11-15 are based on the file cities_exam.csv. Upload the file to RapidMiner. The file contains data about air quality in each of the cities. Use operator Select Attributes to remove ‘State’ and ‘Region’ from the analysis. Then use operator Set Role to set target role ‘id’ to the attribute ‘city’. How many regular attributes are now in the dataset?
10
11
12
13
5 points
QUESTION 12
Add operator k-Means, in the parameters, change the number of k to 7, and have ‘add as label’ checked. Run the model and check the statistics. What is the size of the largest cluster?
14
17
21
36
5 points
QUESTION 13
Replace operator k-Means with operator X-Means. What is the optimal number of clusters?
3
4
6
9
5 points
QUESTION 14
Check the size of clusters produced by X-Means. What is the size of the largest cluster?
28
40
52
56
5 points
QUESTION 15
Check the results. What other cities are in the same cluster as New York?
San Francisco and New Orleans
Seattle and Detroit
Chicago and Los Angeles
Atlanta and Boston
Close the file.
5 points
QUESTION 16
Questions 16-20 are based on the file purchases.csv. Upload the file to RapidMiner, run and explore the results. Set the role of ReceiptID to id, change data type for all integers to binominal. Run the model. How many binominal attributes are in the dataset?
9
10
11
12
5 points
QUESTION 17
Add operator FP-Growth, change the min_support to 0.3. Run the model. Which product has the highest support?
butter
cookies
milk
produce
5 points
QUESTION 18
In operator FP-Growth, change min support value to 0.1. Run the model. What is the size of the largest itemset? (Hint: keep ‘find min number of itemsets’ unchecked)
2
3
4
5
5 points
QUESTION 19
Add operator Create Association Rules, change min confidence to 0.5. Run the model. What rule has the highest confidence value?
when cookies, pasta and cheese are purchased, butter is also purchased
when cookies are purchased, cheese is also purchased
when butter and cookies are purchased, cheese is also purchased
when butter and pasta are purchased, cookies are also purchased
5 points
QUESTION 20
In the results, explore the Lift values. Based on the lift value between cheese and butter, is it likely that butter is purchased when cheese is purchased?
Yes
No

× How can I help you?