Skip to content

Directories in ingested zips are errantly treated as files #15

@ajnelson-nist

Description

@ajnelson-nist

I wrote a test to confirm that files in an aff4 archive created with pyaff4 match what I expect them to be, by using aff.py --extract-all. Unfortunately, dumping files fails, because a directory from my input is treated like a file. It appears to be an issue that affects all directories.

This processing path follows creating an aff4 archive from scratch using a zip. (Particularly, this is a zipped LoC Bag, though I don't think that has an impact apart from an internal path name not entirely relevant to this bug.) Reproduction instructions are included.

Suspected diagnosis

Every member of a zip, whether a file or directory, appears to be assigned the type aff4:FileImage per the --meta dump from the .aff4 file. I'm guessing in-zip directories should instead be aff4:FolderImage, as this query is being used to feed a loop:

for imageUrn in resolver.QueryPredicateObject(volume.urn, lexicon.AFF4_TYPE, lexicon.standard11.FileImage)

And in that loop, every FileImage is being created/treated as regular file. A directory thrown in the mix raises a IsADirectoryError.

Suspected correction

In the function BasicZipFile.parse_cd, somewhere before the info message on line 694, a check needs to be made for the file being a directory. The since-Python-3.6 method of checking for the last character of the name being "/" should do.

However, I don't know the code well enough to suggest where that information be integrated (aside from a check soon after fn is defined in that function), and propagated to causing a aff4:FolderImage. The ZipInfo class in that file?

Steps to reproduce

The code segments below work when run as individual shell scripts, confirmed on an Ubuntu 18.04 system.

  1. Create a zip with some directory in it.
#!/bin/bash

# step1.sh

rm -rf deep flat
mkdir -p flat
mkdir -p deep/input_dir_1

echo 'file 1' > flat/file1.txt
echo 'file 2' > flat/file2.txt
pushd flat
  zip -r ../flat.zip .
popd
rm -r flat

echo 'file 3' > deep/file3.txt
echo 'file 4' > deep/input_dir_1/file4.txt
pushd deep
  zip -r ../deep.zip .
popd
rm -r deep
  1. Ingest the zips into their respective aff4 archives.
#!/bin/bash

# step2.sh

# (First loading venv, fixing path to aff4.py ...)

python .../aff4.py \
  --hash \
  --ingest \
  --paranoid \
  --recursive \
  flat.aff4 \
  flat.zip

python .../aff4.py \
  --hash \
  --ingest \
  --paranoid \
  --recursive \
  deep.aff4 \
  deep.zip
  1. Extract everything from the flat aff4 archive. Currently works.

Pull Request 14 fixes an unrelated issue with the way extractAll is called, and updates Pull Request 13 as a matter of convenience---I also found some of @gonmator's fixes while fixing this call.

#!/bin/bash

# step3.sh

# (First loading venv, fixing path to aff4.py ...)

rm -rf extraction_flat
mkdir extraction_flat

# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
  --extract-all \
  --folder extraction_flat \
  flat.aff4 \
  extraction_flat
  1. Extract everything from the aff4 archive. Currently fails.

PR 14 should be integrated in order to see step3.sh below fail in the illustrative way.

#!/bin/bash

# step4.sh

# (First loading venv, fixing path to aff4.py ...)

rm -rf extraction_deep
mkdir extraction_deep

# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
  --extract-all \
  --folder extraction_deep \
  deep.aff4 \
  extraction_deep

Traceback of step4.sh:

Traceback (most recent call last):
  File "../deps/pyaff4/aff4.py", line 421, in <module>
    main(sys.argv)
  File "../deps/pyaff4/aff4.py", line 414, in main
    extractAll(dest, args.folder)
  File "../deps/pyaff4/aff4.py", line 312, in extractAll
    with open(destFile, "wb") as destStream:
IsADirectoryError: [Errno 21] Is a directory: 'extraction_deep/deep.zip/input_dir_1'

Resolution confirmation

When step4.sh above creates this file hierarchy, this Issue's good to close.

$ find extraction
extraction_deep
extraction_deep/deep.zip
extraction_deep/deep.zip/file3.txt
extraction_deep/deep.zip/input_dir_1
extraction_deep/deep.zip/input_dir_1/file4.txt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions