Skip to content

Parse zip with top level directory#393

Merged
dtronmans merged 2 commits intomainfrom
feat/nested-zip-parse
Feb 23, 2026
Merged

Parse zip with top level directory#393
dtronmans merged 2 commits intomainfrom
feat/nested-zip-parse

Conversation

@dtronmans
Copy link
Contributor

@dtronmans dtronmans commented Feb 23, 2026

Purpose

Make it possible to parse a zip file with a top-level directory instead of immediate sub-directories.

Specification

  • Assumes there is only one directory at the root of the zip and nothing else. I filter away __MACOSX, Thumbs.db and desktop.ini as well as anything that could start with . (.DS_STORE, .cache) in case these are created with the top-level dir when the zip file is created.
  • The only possible unintended change in parsing is that ClassificationDirectoryParser could falsely win on wrapped archives because it accepts generic class-folder layouts. To avoid that, we unwrap a single top-level folder only when it has explicit dataset-root markers (eg train/valid/test, images/labels, annotations.json, labels.json, data.yaml) or when that inner folder is recognized by any non-CLSDIR parser.
  • test_zip_layout_equivalence.py: for different zip datasets, parse the original (flat) zip, parse the zip with an added top-level directory, and parse the extracted zip folder and assert that all 3 return the same LDFs.
  • For test_zip_layout_equivalence.py, I used the helper methods from test_parse_export_equivalence.py to assert that the resulting annotations are the same, so since they are used in two tests I moved them to utils.py

Dependencies & Potential Impact

Deployment Plan

Testing & Validation

imagenet-sample zip file with top-level dir, parser detects the correct format:
correctimagenet

coco-2017 zip file with top-level dir, parser detects the correct format:
correctcoco

@dtronmans dtronmans requested a review from a team as a code owner February 23, 2026 12:18
@dtronmans dtronmans requested review from conorsim, klemen1999, kozlov721 and tersekmatija and removed request for a team February 23, 2026 12:18
@github-actions github-actions bot added the enhancement New feature or request label Feb 23, 2026
Copy link
Collaborator

@klemen1999 klemen1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on a few examples and LGTM, ty

@dtronmans dtronmans merged commit 1f06caf into main Feb 23, 2026
13 checks passed
@dtronmans dtronmans deleted the feat/nested-zip-parse branch February 23, 2026 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants