This repositry is focused on investigating the trends in occupations of New York City from 1850 to 1910. Several R scripts can be found in scripts folder which contains information on the data set, such as analysis of missing data and general trends. The table below shows a breakdown and description of the scripts within.
| Script | Description |
|---|---|
| 1_Initial_EDA | Contains some exploratory data analysis on the overall data set by the researcher |
| 2_Missing_label_analysis | Focused on investigating missing occ codes in the data across the three periods |
| 3_Occupation_counts_by_year | Multiple visualisations of the occupations in each time period |
| 4_Occupation_trends_of_top_occupations | Multiple visualisations showing how the top occupations of each time period changed |
| 5_Unifying_occ_across_time_periods | Analysis of various occupation variables to access suitability for trend analysis, applying occ_modifier function to clean up occstr |