-
Notifications
You must be signed in to change notification settings - Fork 26
Description
I wrote a test to confirm that files in an aff4 archive created with pyaff4 match what I expect them to be, by using aff.py --extract-all. Unfortunately, dumping files fails, because a directory from my input is treated like a file. It appears to be an issue that affects all directories.
This processing path follows creating an aff4 archive from scratch using a zip. (Particularly, this is a zipped LoC Bag, though I don't think that has an impact apart from an internal path name not entirely relevant to this bug.) Reproduction instructions are included.
Suspected diagnosis
Every member of a zip, whether a file or directory, appears to be assigned the type aff4:FileImage per the --meta dump from the .aff4 file. I'm guessing in-zip directories should instead be aff4:FolderImage, as this query is being used to feed a loop:
for imageUrn in resolver.QueryPredicateObject(volume.urn, lexicon.AFF4_TYPE, lexicon.standard11.FileImage)
And in that loop, every FileImage is being created/treated as regular file. A directory thrown in the mix raises a IsADirectoryError.
Suspected correction
In the function BasicZipFile.parse_cd, somewhere before the info message on line 694, a check needs to be made for the file being a directory. The since-Python-3.6 method of checking for the last character of the name being "/" should do.
However, I don't know the code well enough to suggest where that information be integrated (aside from a check soon after fn is defined in that function), and propagated to causing a aff4:FolderImage. The ZipInfo class in that file?
Steps to reproduce
The code segments below work when run as individual shell scripts, confirmed on an Ubuntu 18.04 system.
- Create a zip with some directory in it.
#!/bin/bash
# step1.sh
rm -rf deep flat
mkdir -p flat
mkdir -p deep/input_dir_1
echo 'file 1' > flat/file1.txt
echo 'file 2' > flat/file2.txt
pushd flat
zip -r ../flat.zip .
popd
rm -r flat
echo 'file 3' > deep/file3.txt
echo 'file 4' > deep/input_dir_1/file4.txt
pushd deep
zip -r ../deep.zip .
popd
rm -r deep
- Ingest the zips into their respective aff4 archives.
#!/bin/bash
# step2.sh
# (First loading venv, fixing path to aff4.py ...)
python .../aff4.py \
--hash \
--ingest \
--paranoid \
--recursive \
flat.aff4 \
flat.zip
python .../aff4.py \
--hash \
--ingest \
--paranoid \
--recursive \
deep.aff4 \
deep.zip
- Extract everything from the flat aff4 archive. Currently works.
Pull Request 14 fixes an unrelated issue with the way extractAll is called, and updates Pull Request 13 as a matter of convenience---I also found some of @gonmator's fixes while fixing this call.
#!/bin/bash
# step3.sh
# (First loading venv, fixing path to aff4.py ...)
rm -rf extraction_flat
mkdir extraction_flat
# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
--extract-all \
--folder extraction_flat \
flat.aff4 \
extraction_flat
- Extract everything from the aff4 archive. Currently fails.
PR 14 should be integrated in order to see step3.sh below fail in the illustrative way.
#!/bin/bash
# step4.sh
# (First loading venv, fixing path to aff4.py ...)
rm -rf extraction_deep
mkdir extraction_deep
# Note that the last argument here will not be necessary if PR 16 is incorporated.
python .../aff4.py \
--extract-all \
--folder extraction_deep \
deep.aff4 \
extraction_deep
Traceback of step4.sh:
Traceback (most recent call last):
File "../deps/pyaff4/aff4.py", line 421, in <module>
main(sys.argv)
File "../deps/pyaff4/aff4.py", line 414, in main
extractAll(dest, args.folder)
File "../deps/pyaff4/aff4.py", line 312, in extractAll
with open(destFile, "wb") as destStream:
IsADirectoryError: [Errno 21] Is a directory: 'extraction_deep/deep.zip/input_dir_1'
Resolution confirmation
When step4.sh above creates this file hierarchy, this Issue's good to close.
$ find extraction
extraction_deep
extraction_deep/deep.zip
extraction_deep/deep.zip/file3.txt
extraction_deep/deep.zip/input_dir_1
extraction_deep/deep.zip/input_dir_1/file4.txt