Welcome to the KSL Car Scraper! This program is designed to gather data from KSL on used cars for sale to find those with the best price/mileage ratio.
KSL Car Scraper begins by making a request to KSL's backend for information on a specified car make and model. Once cars are gathered, the JSON is processed and cars are then fed through a simple linear regression model to create a line of best fit for prices and mileages. Then, the cars are ranked based on having the best mileage vs. predicted mileage based on price. Cars that rank higher than average are logged in a file called 'good_deals.txt' and those who rank below average are logged in 'bad_deals.txt'.
python3 main.py {car_make} {car_model}
KSLCarScraper's fetch request works by sending a GET request for each 'page' of cars that the backend would normally supply to the website. Each 'page' contains 24 cars. The program default to querying for 6 'pages' of cars, which totals to 144 cars parsed at maximum. If you would like to override this, an optional 3rd argument can be passed to alter the number of pages parsed.
Running the command python3 main.py Toyota Corolla 10 will query KSL for all of their Toyota Corollas up to 240 in quantity.
Each car is given a predictive mileage based on the line of best fit created by the model.
The score is calculated as such: score = 1 - (predictive_mileage / mileage)
This means that cars with less mileage than expected receive a negative score, which is a good thing.
This formula also means that 1 is the worst score, and the best score could technically stretch to negative infinity.
Usually anything smaller than -1 is considered very low mileage or price.
Each File outputted in this folder contains this information for each car:
- Score
- Price
- Mileage
- Location (Utah)
- Year
- Link (to KSL)
There is a constant set at the top of send_request.py which sets the maximum car price to 15000 - this can be altered if desired. The algorithm also filters out cars that aren't in Utah (KSL is based in Utah) and those that have mileage over 200000. If you would like to override either of these, they can be found in the filter_cars method in main.py