Note: This library was vibe coded with Claude. It works, it's tested, but review accordingly.
Python library for parsing UniProt XML files. Handles both single-entry downloads and multi-GB gzip-compressed database dumps with bounded memory usage.
pip install uniprotlibOr with uv:
uv add uniprotlibfrom uniprotlib import parse_xml
# single file
for entry in parse_xml("Q9Y261.xml"):
print(entry.primary_accession, entry.protein_name)
# gzipped bulk download
for entry in parse_xml("uniprot_sprot.xml.gz"):
print(entry.gene.primary, entry.organism.scientific_name)
# multiple files
for entry in parse_xml("human.xml.gz", "mouse.xml.gz"):
print(entry.primary_accession)parse_xml() returns an iterator that yields UniProtEntry objects. Gzip detection is automatic based on the .gz extension. Memory stays bounded regardless of file size.
| Model | Fields |
|---|---|
UniProtEntry |
primary_accession, accessions, entry_name, dataset, protein_name, gene, organism, sequence, keywords, db_references |
Gene |
primary, synonyms, ordered_locus_names, orf_names |
Organism |
scientific_name, common_name, tax_id, lineage |
Sequence |
value, length, mass, checksum |
DbReference |
type, id, molecule, properties |
All model classes are dataclasses with full type annotations and py.typed support.
Requires Python >= 3.12 and uv.
uv sync
uv run pytest tests/ -vMIT