You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 9, 2024. It is now read-only.
Thank you for you work on this package. For research purpose, I would like to get features (and eventually reproduce classification) on the entire XML dump of french wiki (20181101 for instance). Of course, this can hardly be done with API queries.
Is there a way to extract feature while parsing XML dump, for instance with mediawiki-utilities :) I can imagine that it can by done by changing this line in the example code :
but not being a Python star (more like a R guy !), I'm quite confused. Can you show me just a little example of how to parse for instance 5 first revisions of a little dump file ?