Explain why cross-validation is used in both supervised learning (classification) and unsupervised learning (clustering)?

ASSIGNMENT

Provide a brief description and examples of each of the following methods of clustering:

Partitioning methods.

Hierarchical methods.

Density-based methods.

Grid-based methods.

Load the soybean diagnosis data set in Weka (found in Weka-3.6/data/soybean.arff), then perform the following:

Build a decision tree by selecting J48 as the classifier and 10-way cross-validation. Then fill out the following table:

Correctly Classified Instances
Incorrectly Classified Instances
Kappa statistic
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances

Build a Naïve Bayes classifier and select 10-way cross-validation. Then fill out the following table:

Correctly Classified Instances
Incorrectly Classified Instances
Kappa statistic
Mean absolute error
Root mean squared error
Relative absolute error
Root relative squared error
Total Number of Instances

Compare between results in previous two sections (a and b), which algorithm give the better result and why?

Construction and evaluation of a classifier’s accuracy on a dataset require partitioning labeled data into a training set and a test set. Explain three main methods used for such partitioning.

Explain why cross-validation is used in both supervised learning (classification) and unsupervised learning (clustering)?