Skip to content

[FEATURE REQUEST] Add a test that verifies the md5 value of each npz file. #429

@jiaqima

Description

@jiaqima

Is your feature request related to a problem? Please describe.

Since we plan to include the md5 value in the file name of each npz file, we can add a test to verify the file. This will help avoid malicious attack by someone trying to upload a file with the same name of an existing file but with different content.

Describe the solution you'd like

The test can be added in two levels.

  1. The pytest level. Whenever there is a change in a dataset folder in a new PR, we can verify all the npz files included in the metadata.json and task json files.
  2. The dataloader level. We could also add assertion in the dataloading functions (maybe only do this when the files are downloaded). But this will break the current datasets with old file naming. So this can only be done after all the datasets are updated. This will also increase the computation overhead a little bit, which I'm not sure is worth or not.

1 is redundant if 2 is implemented.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions