Skip to content

karimknaebel/itar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

itar

PyPI version docs

itar builds constant‑time indexes over one or more tar file shards, enabling direct, random access to members without extracting the archives. It ships a lightweight CLI (itar) and a Python API.

Designed for large datasets and deep‑learning pipelines, it supports single or sharded tar archives with thread‑safe access for concurrent reads.

Quickstart

pip install itar

Single tarball

echo "Hello world!" > hello.txt
tar cf hello.tar hello.txt       # regular tarball

itar index create hello.itar     # indexes hello.tar
itar index list hello.itar       # list indexed members
import itar

with itar.open("hello.itar") as archive:
    print(archive["hello.txt"].read())

Sharded tarballs

Give each shard a zero-padded suffix before building the index:

tar cf photos-0.tar wedding/    # shard 0
tar cf photos-1.tar vacation/   # shard 1

itar index create photos.itar   # discovers photos-0.tar, photos-1.tar, ...
itar index list -l photos.itar  # shard index, offsets, byte sizes
from PIL import Image
import itar

with itar.open("photos.itar") as photos:
    assert "wedding/cake.jpg" in photos
    with Image.open(photos["vacation/sunrise.jpg"]) as img:
        print(img.size)

Docs

Full CLI, API, and format details live in the documentation site.

About

tar file index for constant-time member access

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages