Data set found on Kaggle: Swiss banknote counterfeit detection
Inspiration: Our group thinks that the subject of detecting fake money is very interesting. Moreover, The data is suitable for multivariate analysis.
The data consists of 200 observations on 7 variables
Counterfeit: indicator random variable: 1 if counterfeit, 0 if genuine
Length: length of the bill from the left edge to the right edge in millimeters
Left: length of the left edge from bottom to top in millimeters
Right: length of the right edge from bottom to top in millimeters
Bottom: bottom margin width in millimeters
Top: top margin width in millimeters
Diagonal: Length of diagonal of the bill in millimeters
Response variable: Counterfeit
Predictor variables: Length, Left, Right, Bottom, Top, and Diagonal
Main Question: Are the variables Length, Left, Right, Bottom, Top, and Diagonal together a good predictor of whether a bill is counterfeit or genuine?
We are planning to use inference for multivariate means to compare the different sample mean vectors for counterfeit vs real bills, and to test if they are different. We will also use PCA to try to reduce the number of variables if possible.
PCA analysis group includes: This group will use PCA to reduce the dimension of data. At the same time look for potential correlations between different predictors.
Hypothesis testing group 1: This group will compare the multivariate means for counterfeit and real bills using hypothesis testing.
Hypothesis testing group 2 includes: This group will try to use appropriate statistical techniques to infer the population means of each predictor variable for counterfeit and real bills.