Download the dataset Health Care Cost per Employee.csv . In this dataset you will
find data on small to mid sized local business and they’re health care costs (in
thousands, and actually it’s a bunch of other benefit costs as well, but just pretend its
health care costs and ignore why the numbers seem so high).
There are two variables, the first is the number of employees that a company has. This number ranges from a single employee up to about 100 employees. The second variable represents the average cost in benefits associated with employees.
You’ll notice if you scatter plot, benefits for a small number of employees is quite
high (image paying single payer health insurance for a few people and their families
in addition to insuring them at work Ect. Ect.).
You are tasked with the following:
1. Develop a model for estimating the average or expected average cost of benefits based on the number of employees a company has. If you develop a parametric model, provide the model.
If you develop a non parametric model, graphically represent your model overlaid on top of a scatterplot of the data. In either case document how you arrived at your final model.
2. Create a 95% confidence interval for E(avg. cost|55 employees). That is, compute a 95% confidence interval for the average cost of benefits per employee for all companies that have 55 employees.
3. Create a 95% prediction interval for E(avg.cost|55 employees).
4. Add your results from part 2 and 3 to your scatterplot