Data mining and data warehousing
Question One
Define Frequent Pattern Analysis and cite its applications. What are the different methods used?
Question Two
Explain in your own words the difference between “supervised learning” and “unsupervised learning”. Cite some examples of use of each one.
Question Three
Given the testing dataset and the constructed decision tree below. Calculate the accuracy, error rate, sensitivity, specificity, precision, and recall. The model predicts if a person will buy a computer or not based on his/her information.
age | income | student | credit_rating | buys_computer |
<=30 | high | no | fair | no |
<=30 | high | no | excellent | no |
31…40 | high | no | fair | yes |
>40 | medium | no | fair | no |
>40 | low | yes | fair | yes |
>40 | low | yes | excellent | no |
31…40 | low | yes | excellent | yes |
<=30 | medium | no | fair | no |
<=30 | low | yes | fair | yes |
>40 | medium | yes | fair | yes |
<=30 | medium | yes | excellent | no |
31…40 | medium | no | excellent | yes |
31…40 | high | yes | fair | yes |
>40 | medium | no | excellent | no |
Question Four
Explain the k-means algorithm. Cite a software/program (except weka) or an online tool providing the k-means algorithm (screenshot is required).
Question Five
Consider the database containing transaction data as shown in the table below. Apply Apriori algorithm and find the frequent itemsets where min-sup=2.
TID | Items Bought |
1 | Daiper, Bread, Juice |
2 | Eggs, Bread, Pasta |
3 | Daiper, Eggs, Bread, Pasta |
4 | Eggs, Pasta |
5 | Rice |