Feature/add cleaner #3

devZenta · 2025-12-08T14:32:34Z

This pull request introduces a new data cleaning script and refactors the IMDB scraper to improve code consistency and readability. The main changes include the addition of a cleaner.py script for preprocessing movie data, standardizing string formatting and selector usage across the scraper, and minor improvements to path handling in the main entry point.

New functionality:

Added src/cleaner/cleaner.py to clean and preprocess raw movie metadata, selecting relevant columns and removing rows with missing values. The output is saved to imdb_cleaned.csv and a preview is printed.

Scraper refactoring and consistency improvements:

Standardized all string quotes to double quotes and reformatted selectors and dictionary keys throughout src/scrapping/scraper.py for consistency and readability. [1] [2] [3] [4] [5] [6] [7]
Improved selector formatting and handling for movie details extraction (title, rating, duration, genres, director, actors) to make the code easier to maintain and less error-prone. [1] [2] [3] [4]
Updated the CSV export logic in the scraper to use consistent fieldnames and quoting.

Minor improvements:

Cleaned up path handling in src/main.py by standardizing quotes and import order.

Degalax added 3 commits December 8, 2025 15:29

style: standardize string quotes in main.py

19a51e6

refactor: improve code readability and consistency in scraper.py

bdd43a9

feat: add cleaner and csv dataset

088c7cf

devZenta requested review from Degalax, GARATONCODE, Lockxii, eather55 and francoisdotdev December 8, 2025 14:32

devZenta assigned Degalax Dec 8, 2025

devZenta added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 8, 2025

devZenta merged commit 8a22420 into main Dec 8, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/add cleaner #3

Feature/add cleaner #3

Uh oh!

devZenta commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Feature/add cleaner #3

Feature/add cleaner #3

Uh oh!

Conversation

devZenta commented Dec 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants