extracting features from xml dump

Hi,

Thank you for you work on this package. For research purpose, I would like to get features (and eventually reproduce classification) on the entire XML dump of french wiki (20181101 for instance). Of course, this can hardly be done with API queries.

Is there a way to extract feature while parsing XML dump, for instance with mediawiki-utilities :) I can imagine that it can by done by changing this line in the example code : 

```
  extractor = Extractor(mwapi.Session(host="https://en.wikipedia.org",
                                          user_agent="revscoring demo"))
```

but not being a Python star (more like a R guy !), I'm quite confused. Can you show me just a little example of how to parse for instance 5 first revisions of a little dump file ?

Thank you again for this work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

extracting features from xml dump #420

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

extracting features from xml dump #420

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions