[WIP] Convert to python package with cli #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Following the discussion in #1 , this PR aims for the following:
I am using poetry to manage the packaging aspect. The package is called
paperscraper.process: command group for the different processes to process data. All subcommand can take one optional flag-f/--force. When this flag is not a given processes will run only when the corresponding output of the process doesn't exist.run_all: run all the processes.db: clean the xml filevenues: extract unique venuesdata-extraction: extract the data from dblp snapshotcollect-data: scrape additional informationpostprocess: clean and extract unique datasearch: takespatternstring and returns any entries that has a match in the title or abstract. By default uses fuzzy matching. Has the following options:--venue: filter by venue. Can have multiple--venue. Each can be a partial match to either full name or short name.--author: filter by author. Can have multiple--author. Each can be a partial match.--re: a flag, when set, thepatternwill be treated as regex.--fuzzy-max-difference: the maximum number of differences allowed from thepatternto get a match.list: summery of the data (lists venues)