Skip to content

Conversation

@devZenta
Copy link
Owner

@devZenta devZenta commented Dec 8, 2025

This pull request introduces a new asynchronous IMDB movie scraper to the codebase. It adds a robust scraping module using Playwright to fetch movie details, a CLI entry point for running the scraper, and supporting documentation and data files. The main focus is on enabling the automated collection and storage of movie metadata from IMDB into a structured CSV format.

New IMDB Scraper Implementation

  • Added a comprehensive asynchronous scraper class IMDBScraper in src/scrapping/scraper.py, which retrieves movie details (title, top 3 actors, director, genres, duration, rating, and URL) from IMDB using Playwright. It includes error handling, logging, and flexible selectors to adapt to different IMDB layouts.
  • Implemented a method to save the scraped movie data into a CSV file, supporting reproducible data exports.
  • Provided a module docstring for src/scrapping/__init__.py for clarity and documentation.

Command-Line Interface and Entry Point

  • Added a new CLI entry script src/main.py that allows users to specify the number of movies to scrape and manages the scraping workflow, including output file handling and summary reporting.

Sample Data Addition

  • Added a sample CSV file data/internal/imdb_movies.csv containing example movie records in the expected output format, likely for testing or demonstration purposes.…tput

@devZenta devZenta added documentation Improvements or additions to documentation enhancement New feature or request labels Dec 8, 2025
@devZenta devZenta merged commit 1b256d7 into main Dec 8, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants