This project shows the work I did for a local menswear clothing company, Ratio Clothing, to improve their current model for predicting shirt sizes as part of my capstone project for Galvanize's immersive data science program. Ratio’s business model focuses on crafting the perfect fitting, custom dress shirt without the need for a tailor. To do this, they use the customer’s survey answers along with body measurement corrrelations. Through machine learning techniques, I helped optimize Ratio’s model by reducing the error of their estimates to allow them to make more accurate predictions.
The model Ratio supplied me with used coefficients from a linear regression and a subset of the customer's survey responses to predict shirt measurements. I made 3 major changes that improved the model's predictive power. First, I only predicted on customer surveys when the customer made a repeat order to assure the predictions the model was built on were accurate. Second, from this subset of orders, I included all information the customer supplied. And lastly, I employed gradient boosting and random forest models to better predict shirt measurements. While reworking Ratio's model, I concentrated on neck, sleeve, chest, and waist measurements.
After incroporating all of the provided customer responses with machine learning techniques, I saw drastic improvements in the mean squeared error and variance explained by my models. For example, I saw the MSE decrease to 0.0309, from a previous 0.4775, and the variance explained increase to 0.9785, from 0.6798, for the neck random forest model. I saw similar improvements across all 4 measurements, with either random forest or gradient boosting as the best performing model.
Through this more powerful, machine learning model, Ratio will be able to increase it's profit margin by decreasing the number of shirts that are remade.