This project builds a machine learning model to detect whether a given URL is malicious or legitimate.
It is fully implemented in a Google Colab notebook, making it easy to run without any local setup.
- Colab-Ready: Run the notebook directly in Google Colab.
- Dataset Preprocessing: Cleaning, tokenizing, and extracting lexical & statistical features from URLs.
- Feature Engineering: Attributes include:
- URL length
- Presence of suspicious keywords
- Domain structure
- Character patterns
- Model Training: Experiments with multiple algorithms:
- Logistic Regression
- Decision Tree
- Random Forest
- Evaluation Metrics:
- Accuracy
- Precision
- Recall
- F1-Score
URL_website_detection/
β
βββ URL.ipynb # Main Google Colab notebook
βββ dataset.csv # Input dataset (if included)
βββ README.md # Project documentation
- Open the notebook in Google Colab using the badge above.
- Upload or connect the dataset.
- Run each cell sequentially to:
- Preprocess the data
- Extract features
- Train models
- Evaluate results
Author: Aviral Saini
Project Type: Machine Learning / URL Classification