Multi-Scraper

scrapes files (jpg, png, gif, webm) from website

Build & share delightful machine learning apps easily

This Python script allows you to scrape images from various websites such as Instagram, Reddit, 4channel, Warosu, and Desuarchive. The script uses the Chrome browser for scraping and requires certain dependencies to be installed.

Demo

demo.mp4

Installation

Before using the image scraper, please make sure you have the following prerequisites:

Python 3.x installed on your system
Chrome browser (if not using headless option, ensure that Chrome is closed)
Google Chrome Driver compatible with your Chrome browser version
Download the appropriate Chrome Driver from https://sites.google.com/chromium.org/driver/?pli=1
Extract the Chrome Driver executable file from the downloaded archive.
Move the Chrome Driver executable file to the "driver" folder within the project directory.

To install the required Python packages, run the following command:

pip install -r requirements.txt

Usage

To use the image scraper, execute the main.py script with the following command:

python app.py

Options

The script provides several options that can be specified via command-line arguments or within the script file itself.

--injected: Enable this option to handle websites that inject their content during the initial page load. The script will wait until the page is fully loaded before scraping images. By default, this option is disabled.
--max-images: Specify the maximum number of images you want to scrape from the website. This limits the number of images retrieved. If not specified, all available images will be scraped.
--bulk: Enable this option to scrape images from multiple URLs provided in a text file. The URLs in the text file should be separated by commas. The file path must be provided as an argument.
--headless: Enable this option to run the scraper in the background without opening a visible Chrome browser window. By default, the scraper opens a visible browser window.
--types: Specify the types of files you want to scrape. This option allows you to filter the file types to be downloaded. Supported file types include JPG, PNG, GIF, and WebM. Specify the types as a comma-separated list.
--pause: Enable this option to introduce a delay (in seconds) between opening each URL and downloading each file. This can be useful to prevent excessive requests to the website. By default, there is no pause between requests.
--user-agent: Specify your user agent string. Some websites require a specific user agent to access their content. To find your user agent, visit https://www.whatismybrowser.com/detect/what-is-my-user-agent/ and copy the user agent string. Paste the user agent string into the app GUI's input field called "User Agent".

Gallery

The script will display the scraped images in a gallery format, allowing you to view and interact with the downloaded images conveniently.

Disclaimer

Please note that scraping images from websites may violate the terms of service of those websites. Make sure to use this script responsibly and respect the rights of the website owners. The developers of this script are not responsible for any misuse or legal issues arising from the use of this tool.

Happy scraping!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
__pycache__		__pycache__
themes		themes
.gitignore		.gitignore
README.md		README.md
app.py		app.py
gradio.svg		gradio.svg
requirements.txt		requirements.txt
style.css		style.css
theme_dropdown.py		theme_dropdown.py
ui_components.py		ui_components.py
websites.py		websites.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Scraper

Demo

Installation

Usage

Options

Gallery

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Usagi5677/multi-scraper

Folders and files

Latest commit

History

Repository files navigation

Multi-Scraper

Demo

Installation

Usage

Options

Gallery

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages