feat: implement IMDB scraper with movie details extraction and CSV ou… #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces a new asynchronous IMDB movie scraper to the codebase. It adds a robust scraping module using Playwright to fetch movie details, a CLI entry point for running the scraper, and supporting documentation and data files. The main focus is on enabling the automated collection and storage of movie metadata from IMDB into a structured CSV format.
New IMDB Scraper Implementation
IMDBScraperinsrc/scrapping/scraper.py, which retrieves movie details (title, top 3 actors, director, genres, duration, rating, and URL) from IMDB using Playwright. It includes error handling, logging, and flexible selectors to adapt to different IMDB layouts.src/scrapping/__init__.pyfor clarity and documentation.Command-Line Interface and Entry Point
src/main.pythat allows users to specify the number of movies to scrape and manages the scraping workflow, including output file handling and summary reporting.Sample Data Addition
data/internal/imdb_movies.csvcontaining example movie records in the expected output format, likely for testing or demonstration purposes.…tput