1) Explore the relationship between the numerical predictors and response (FARE) by creating correlation tables, scatterplot matrices, etc. Based on these results, which one of the numerical variables does seem to be the best predictor of FARE?
2) Explore the relationship between the categorical predictors (excluding S_CITY and E_CITY) and FARE by creating boxplots and computing the percentage of flights in each category. Create a tabular summary with the average fare in each category. Which categorical predictor seems the best for predicting FARE?
3) Find a model for predicting the average fare on a new route (you can again ignore S_CITY and E_CITY). Provide some details about the process that brought you to the final model you propose.
4) Using the model you identified as best in question 3), predict the average fare on a route with the following characteristics: COUPON = 1.202, NEW = 3, VACATION = “No”, SW = “No”, HI = 4442.141, S_INCOME = $28,760, E_INCOME = $27,664, S_POP = 4,557,004, E_POP = 3,195,503, SLOT = “Free”, GATE = “Free”, PAX = 12,782, DISTANCE = 1976 miles.
5) Using the model you identified as best in question 3), predict the reduction in average fare on the route provided in question 4) if Southwest Airlines decides to cover this route.