Things you need to do before running the code
- install dplyr package
- ensure run_analysis.R is in your current working directory, or else set your current working directory to the folder run_analysis.R resides
- connected to internet
============ how my run_analysis.R works
-
load dplyr package
-
check if data folder exists a. if data folder does not exist b. download from https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip c. unzip the folder
-
load train data set (subject_train, X_train, y_train) and combine all the train data set into 1 data frame
-
load test data set (subject_test, X_test, y_test) and combine all the test data set into 1 data frame
-
Merge the training and the test sets to create one complete data set
-
load the feature descriptions and label subject column and activity column into the complete data set.
-
extract all variables which labels contain mean(), std(), "Training.Label.ID" or "Subject.ID"
-
clean up some variables that do not belongs to this experiment( assume repeated words are a result of error)
-
load the list of descriptive activity and describe the Activity numbers in the extracted data set
-
Search for Abbreviations and subsitute with the full word to make the variable name meaningful
-
Group the People and Activity using factor level
-
Average all the variables based on the groups generated in the previous step
-
Generate a file based on the group-based average results