refactor: hiv #42

tristan-f-r · 2025-07-30T19:44:40Z

Adds documentation back to the HIV dataset, and:

Does offline UniProt mapping. This drops a significant part of the code.
Drops KEGG gold standard generation, since it wasn't sufficient. Note that we now include a Prior work section in datasets/README.md, so this will never be actually lost.

agitter

I left some initial comments that mostly pertain to my different expectations for what can go in the dataset readmes. I am finding it hard to review the new script-based pipeline with respect to the original notebooks. I'm not sure if it is a pure port or a complete rewrite.

datasets/hiv/README.md

datasets/hiv/scripts/kegg_orthology.py

datasets/hiv/scripts/name_mapping.py

datasets/hiv/raw/README.md

tristan-f-r · 2026-01-23T05:07:30Z

The kegg_orthology.py is a port. Every other file has been substantially rewritten, especially name_mapping.py, though for the better: this entire pipeline is now just 90 lines of Python.

tristan-f-r · 2026-01-23T20:11:19Z

I'm going to separate the miscellaneous cache changes over to make this an easier diff to read.

tristan-f-r assigned agitter Jul 30, 2025

agitter reviewed Aug 8, 2025

View reviewed changes

tristan-f-r mentioned this pull request Aug 11, 2025

dataset: DISEASES #39

Merged

tristan-f-r unassigned agitter Jan 22, 2026

tristan-f-r changed the title ~~docs: hiv~~ refactor: hiv Jan 23, 2026

tristan-f-r added the dataset Mutating datasets in any way. label Jan 23, 2026

tristan-f-r mentioned this pull request Jan 23, 2026

refactor: drop databases, cache improvements #56

Merged

tristan-f-r force-pushed the hiv-clean branch 3 times, most recently from bd99a25 to e2afb83 Compare January 23, 2026 21:43

tristan-f-r added 2 commits January 23, 2026 21:44

move files

298415a

refactor: hiv

0314c79

tristan-f-r force-pushed the hiv-clean branch from e2afb83 to 0314c79 Compare January 23, 2026 21:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: hiv #42

refactor: hiv #42

Uh oh!

tristan-f-r commented Jul 30, 2025 •

edited

Loading

Uh oh!

agitter left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tristan-f-r commented Jan 23, 2026

Uh oh!

tristan-f-r commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

refactor: hiv #42

Are you sure you want to change the base?

refactor: hiv #42

Uh oh!

Conversation

tristan-f-r commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tristan-f-r commented Jan 23, 2026

Uh oh!

tristan-f-r commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tristan-f-r commented Jul 30, 2025 •

edited

Loading