Rewrite most of the scraping logic for the revamped profile design #5
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The profile pages on SO/SE have been completely rewritten (see announcement from December 7, 2021), which means much of this library has to be rewritten.
Since the profile pages are an opaque mess of nested divs now (starting to look a lot like twitter HTML), the easiest approach I could find was to find divs with titles like this:
One tag on the tag page gets one of these divs, and this already gives us the tag score. Inside there's a tag with the tag's name for text. I didn't want to rely on those random-looking strings in the
classattribute.I've also changed a handful of things (some of them stylistic):
returns.No doubt the company will add arbitrary small changes in a few weeks just to break scrapers like this. Until then this should work (even if slow due to the throttling/pushbacks).