Add examples for new quotes.toscrape.com endpoints (fixes #15) #16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds spiders for 6 new endpoints available on quotes.toscrape.com, providing comprehensive coverage of modern web scraping techniques. Currently, quotesbot only covers 2 out of 8 endpoints on the sandbox - this update brings complete endpoint coverage.
Closes #15
The quotes.toscrape.com sandbox has evolved significantly with new endpoints designed to teach modern web scraping challenges. While quotesbot has remained an excellent starting point with CSS and XPath examples, it doesn't demonstrate techniques for JavaScript rendering, APIs, authentication, and other scenarios that students encounter in real-world scraping projects.
Changes
1. New Spiders for Modern Endpoints
JavaScript & API Handling:
toscrape-js.py →
/js/endpoint<script>tags asvar data = [...]toscrape-scroll.py →
/api/quotes?page=Nendpointhas_nextandpagefieldsAuthentication & Forms:
toscrape-login.py →
/loginendpointFormRequest.from_response()for automatic CSRF handlingtoscrape-viewstate.py →
/search.aspxendpoint__VIEWSTATEhidden fieldsComplex Layouts:
toscrape-table.py →
/tableful/endpointtoscrape-random.py →
/randomendpoint2. Updated Existing Spiders
QuotesbotIteminstead of plain dicts for consistency.extract_first()and.extract()with.get()and.getall()3. Enhanced Data Model
items.py: Added explicit field definitions fortext,author, andtagspasswith commented placeholder4. Comprehensive Documentation