Project Gutenberg rdf files parser.
- Node.js 12.16.3
- PostgreSQL 9.5.22 listening on port
5432
- Create a database called
gutenberg_dbthat will be used by the app. - Create a database called
gutenberg_db_testthat will be used by the tests. - Add a file called
secrets.jsin the project root with the following structure:
module.exports = {
DEV_DB_USER: "user name for gutenberg_db",
DEV_DB_PASSWORD: "user password for gutenberg_db",
TEST_DB_USER: "user name for gutenberg_db_test",
TEST_DB_PASSWORD: "user password for gutenberg_db_test"
}
- Get the rdf files:
- Manually by downloading and extracting the rds files from
http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.zipand placing them a in a folder calledrdf-filesin the root of the project - OR run
npm run downloadto download them. A folder calledrdf-filesin the root of the project will be created with the files in it. - Install dependencies with
npm i
- Run
npm startto start the process in order to parse the files and save their contents in theBookstable. This script will automatically run the migrations to create the table and add the indexes. Two B-Tree indexes were created: one for the title column and one for the publicationDate column. One GIN index for the authors column given it is an array of strings. - Once the process has finished then a message like:
Finished parsing 62418 fileswill be displayed.
- Run
npm testto run the tests. Some tests will read some example rdf files which are located in thetest/rdf-filesfolder and save them in thegutenberg_db_testDB. Once the tests have finished running then theBookstable will be truncated. - The module
nycwas added to provide code coverage capabilities and the results are displayed once the tests finish running. Also, the foldertest-resultswill be created in the project root with a html page to check code coverage.
- Add tests for the download service.
- Plug download and processing.