Skip to content

Personal Project for web scraping using MongoDB and Python (scrapy module)

Notifications You must be signed in to change notification settings

bachaquer/Web-Scraper-for-Books-MongoDB-and-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web-Scraper-for-Books-MongoDB-and-Python

Personal Project for Web Scraping using MongoDB and Python (Scrapy module)

Explanation

In this project, we have implemented a web scraper for books on the Internet. Specifically, this project targets books on the website https://books.toscrape.com/. Below is the example of how the books are presented on the website.

image

In order to scrape all needed data (title, URL and price in this project), we use Python with Scrapy module to access HTML and CSS code of the website and get the needed data. Following that, since one page only contains up to 20 books, we apply the web crawler to recursively iterate through Next page button URL and scrape all books. Finally, we connect MongoDB and our crawler using pymongo to store everything in the database. The code is then adapted to check for repetition in case of multiple runs using the scraper. This is done by checking for any matches in hashes of URLs of stored books. The SHA256 algorithm is applied for hashing.

Result

Every separate item is stored in JSON format and stored in books_collections.json file in this directory. Example:

image

Conclusion

We have successfully implemented Web Scraper for Book collections using MongoDB and Python.


Batyrkhan M 2025

About

Personal Project for web scraping using MongoDB and Python (scrapy module)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages