This goal of this project is to create a web scraper to access the RI General Statutes (webserver.rilin.state.ri.us/Statutes/) and convert the HTML into structured data files for future use and analysis.
In the project folder, run scrapy crawl laws -o <output_file>.json.
This scraper is a work in progress. Next steps on the TODO list:
- Implement
ItemPipelines to clean up the scraped data. - Finalize
Sectionfields to align with other standard legal code formats.
scrapy(https://docs.scrapy.org/en/latest/index.html) - Python web crawler package used for the project. This repository was created using thescrapy startprojectcommand.