-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Overview of the Jordanian automotive market: Jordan is a left-rudder country for automobiles. The number of cars per thousand people is approximately 120. It does not have its own car brands or production lines. Car consumption relies on imports, and the demand for used cars is strong.
Data and Problem Analysis (Basic Chapter)
Data Exploratory Analysis (EDA)
Price distribution: Is it skewed to the right? Is logarithmic transformation required?
Feature correlation: The relationship between mileage, vehicle age and price (visual scatter plot/heat map).
Analysis of O-category characteristics: The Impact of Brand and Fuel type on Price (box plot grouping statistics).
Reproducible issue: The original project did not fully demonstrate the data distribution. Your analysis can fill this gap.
Problem Definition
Emphasize the pricing uncertainty in the used car market (information asymmetry, fluctuations in supply and demand).
The limitations of traditional valuation methods (such as the residual value method) and how machine learning can improve them.
- Method Improvement and Experimental Comparison (Core Chapter)
(1) Feature engineering optimization
• Comparison of category feature coding:
The influence of o LabelEncoder vs OneHotEncoder vs TargetEncoder on model performance.
• Feature Construction:
Interaction features (such as brand × vehicle age), nonlinear transformation (square root of mileage).
Text feature mining (keyword extraction from vehicle model descriptions).
(2) Model selection and optimization
• Horizontal comparison:
o Linear regression (baseline) vs random forest vs XGBoost vs LightGBM.
Feasibility analysis of introducing deep learning models (such as TabNet).
• Hyperparameter optimization:
Use BayesianOptimization to adjust parameters and compare the default parameter effects of the original project.
(3) Outlier handling and robustness
Propose an improved outlier detection method (such as local outlier factors based on clustering).
Compare the effects of outliers elimination and Winsorization on RMSE.
(4) Design of evaluation indicators
For the right-skewed distribution of prices, RMSLE (logarithmic error) or quantile loss is proposed as supplementary indicators.
- Innovation Points and Theoretical Contributions (Highlight Chapter)
• Propose a hybrid model:
For example: XGBoost + neural network ensemble (Stacking), combining the feature importance of tree models and the nonlinear capabilities of NN.
• Interpretability analysis:
Why is the predicted price of a certain vehicle high? Explain the model prediction with the SHAP value. Brand or mileage dominance? .
• Domain knowledge integration:
Design features in combination with the rules of the used car industry (such as "high-mileage diesel vehicles depreciate faster").
- Application and Business Value (Implementation Chapter)
• System Implementation:
Design a simple valuation Web application (Flask/Django + front-end demonstration).
• Business Model Analysis:
How the o Model helps car dealers or consumers avoid pricing risks (Case Simulation).
• Discussion on Limitations:
Data limitations (such as the lack of accident records, regional differences), and model generalization.
- Suggestions for the structure of the thesis
- Introduction: Challenges in the Used Car Market, The Value of Machine Learning in Pricing.
- Related work: Existing valuation methods (traditional statistics vs. machine learning).
- Data and Methods: EDA, Feature Engineering, Model Design.
- Experiment: Comparative experiment design and result analysis (table + visualization).
- Application and Discussion: System Implementation, Commercial Potential, Limitations.
- Conclusion and Outlook: Summarize contributions and future directions (such as dynamic price prediction).