Skip to content

Conversation

@egabancho
Copy link
Member

Currently, InvenioRDM's only affordance for the bulk creation, import, and/or editing of records and files requires direct CLI-command-driven engagement with records and files APIs.

GUI-based bulk importing and editing of records and files is a widely desired, highly useful feature, which will help to make the platform appealing to a much broader base of institutional users.

The proposed feature is a beta version of a bulk importer for metadata (in CSV format) and associated files.

- ...
To enable easily search and files integration we believe these two objects should extend from records like many other modules do (collections, requests, etc).

The other two objects involved will be the *Resources* and the *Serializers*.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to change this name to avoid confusion with resources and services. Any suggestions? ☺️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about record type instead of resource? I think we used this name elsewhere too

- Determine if a group of records get new DOIs (minted during the process)
- Update many records at once.
- Delete many records at once.
- See the status of past and current uploads.
Copy link
Contributor

@kpsherva kpsherva Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long would the retention period be to keep the previous imports?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still discussing that internally. Our initial approach would be to keep the import tasks indefinitely and "just" delete the attached files of successfully created/updated records after 3 months (probably configurable).

We also considered an "archive" option to help clean up the interface.


GUI-based bulk importing and editing of records and files is a widely desired, highly useful feature, which will help to make the platform appealing to a much broader base of institutional users.

The proposed feature is a beta version of a bulk importer for metadata (in CSV format) and associated files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are any other formats also planned?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially, we are working only with CSV, but we are designing the tool so anyone can add their preferred format.

## Unresolved questions

- Metadata file re-upload to correct errors
- How do we set a file for preview?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any limit on the import size or maximum number of files? It might be quite troublesome to handle big imports

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I wrote this, we have shown it to potential users, and this has come up on several occasions.
I think we will have the same "limitations" as the current deposit form. To mitigate this, we are pondering allowing users to enter known URIs into the files column. Imagine a shared location the service would have access to, like a bucket on AWS/GCP or directly a URL that can be fetched somehow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem might be both on the CSV upload and associated file upload. The CSV /marcxml itself will be problematic if it contains a lot of records, for 2 reasons: size of the csv itself, and the time the underlying task will take to process it - depending on how much memory celery worker has available

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're definitely right. We did consider the problem with the celery workers' memory. Unfortunately, I don't have a solution other than making the process as memory-efficient as possible.
For example, we plan to start a task for each record (row) inside the input file and let that task do the transformation and validation so we will process one record at a time.
I think adding an "artificial" limit to the number of records at this point might not make a lot of sense. Once we have the process in place, and knowing where it can "break", we can load test it and set an informative limit.

As an administrator user I want to ...
- Upload many records (with their files) at once into my instance.
- Select in which communities the records are publishes.
- Determine if a group of records get new DOIs (minted during the process)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this use case already reflected in the mockup?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, on the third screen, there is a checkbox section. For now, it has two options: mint DOIs and publish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants