README for Assignment 1: Model Selection in R

Data Information

Data File: jobs.txt
Columns:
- y: Job proficiency score (first column)
- t1, t2, t3, t4: Scores on four aptitude tests
Ensure to assign appropriate column headings after importing the data.

Assignment Tasks 1

Graphical Summaries: Generate scatterplots of job proficiency against each predictor and interpret what the plots suggest.
All Possible Regressions: Fit all 16 possible models with the four predictors and record model selection metrics: p, R2, R2a,p, PRESSp, AICp, BICp, Mallows Cp.
Best Model Selection: Identify the best model based on each criterion and determine which variable(s) can be excluded.
Model Fitting & Diagnostics: Fit the best model according to R2a,p and assess if model assumptions are met.
Best Model with Two Predictors: Using BICp and Mallows Cp, find the best model with two or fewer predictors.
Validation: Compare SSEp and PRESSp for Model 11.
Automated Search Procedures: Perform forward selection, backward elimination, and stepwise regression using the step() function and compare selected models.
Model Search with Conditions: Use the regsubsets() function from the leaps package to find the best models based on R2a,p, Mallows Cp, and BICp.

Package Requirements

DAAG Package: For computing PRESSp
Leaps Package: For regsubsets() function

Additional Notes

Ensure all plots and results are properly labeled and interpreted.
Check that your models meet statistical assumptions and provide diagnostics when needed.
Maintain clear and concise code structure and comments for readability.

Assignment Tasks 2 Outlier Detection in Grocery Data using R

This project aims to detect influential outliers in grocery data using R. The dataset contains weekly activity data from a national grocery retailer, and the analysis focuses on identifying anomalies in the response and predictor variables.

Dataset Overview

File: grocery.xlsx
Variables:
- labor: Total labor hours per week (response variable)
- shipped: Number of cases shipped in a week
- cost: Labor cost as a percentage of total costs
- holiday: Binary indicator (1 if the week includes a holiday, 0 otherwise)

Analysis Steps

1. Residual Analysis

We fit a first-order linear regression model without interaction effects and analyze different types of residuals:

library(readxl)
data <- read_excel("grocery.xlsx")
result <- lm(labor ~ shipped + cost + holiday, data)

# Visualizing residuals
par(mfrow = c(1, 3))

# Plot of residuals vs fitted values
plot(result$fitted.values, result$residuals,
     main = "Residuals vs Fitted",
     xlab = "Fitted Values", ylab = "Residuals")

# Plot of standardized residuals vs fitted values
plot(result$fitted.values, rstandard(result),
     main = "Standardized Residuals vs Fitted",
     xlab = "Fitted Values", ylab = "Standardized Residuals")

# Plot of studentized residuals vs fitted values
plot(result$fitted.values, rstudent(result),
     main = "Studentized Residuals vs Fitted",
     xlab = "Fitted Values", ylab = "Studentized Residuals")

# Reset graphics layout
par(mfrow = c(1, 1))

2. Detecting Outliers in the Response Variable

We use studentized residuals and the Bonferroni correction to identify potential outliers:

# Calculate sample size and number of predictors
n <- nrow(data)
p <- length(coef(result)) - 1

# Compute studentized residuals
student.res <- rstudent(result)

# Calculate critical value using Bonferroni correction
alpha <- 0.05
crit <- qt(1 - alpha / (2 * n), df = n - p - 1)

# Plot studentized residuals with critical thresholds
plot(student.res,
     main = "Studentized Residuals with Critical Values",
     ylab = "Studentized Residuals")
abline(h = c(crit, -crit), col = "red", lty = 2)

# Identify outliers
outliers <- which(abs(student.res) > crit)
print("Indices of potential outliers:")
print(outliers)

How to Run the Analysis

Ensure that the grocery.xlsx file is in your working directory.
Open the R environment (e.g., RStudio, VSCode) and load the provided scripts.
Execute the code blocks in the order presented.

Results and Observations

Visual analysis: Residual plots highlight any patterns or anomalies in the data.
Outlier detection: Identified observations with studentized residuals beyond the Bonferroni-corrected thresholds.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
BANA		BANA
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

README for Assignment 1: Model Selection in R

Data Information

Assignment Tasks 1

Package Requirements

Additional Notes

Assignment Tasks 2 Outlier Detection in Grocery Data using R

Dataset Overview

Analysis Steps

1. Residual Analysis

2. Detecting Outliers in the Response Variable

How to Run the Analysis

Results and Observations

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Beckversync/BANA4040

Folders and files

Latest commit

History

Repository files navigation

README for Assignment 1: Model Selection in R

Data Information

Assignment Tasks 1

Package Requirements

Additional Notes

Assignment Tasks 2 Outlier Detection in Grocery Data using R

Dataset Overview

Analysis Steps

1. Residual Analysis

2. Detecting Outliers in the Response Variable

How to Run the Analysis

Results and Observations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages