Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions cache/directory.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,27 @@ def download(self, output: str | PathLike):
online="https://depmap.org/portal/download/api/download?file_name=downloads-by-canonical-id%2Fpublic-25q2-c5ef.104%2FOmicsCNGeneWGS.csv&dl_name=OmicsCNGeneWGS.csv&bucket=depmap-external-downloads",
),
},
"KEGG": {
# For some reason, KEGG requires a Referer header: opening this URL otherwise fails.
"ko03250.xml": CacheItem(
name="KEGG 03250",
cached="https://drive.google.com/uc?id=16dtWKHCQMp2qrLfFDE7nVhbwBCr2H5a9",
online="https://www.kegg.jp/kegg-bin/download?entry=ko03250&format=kgml",
online_headers = [('Referer', 'https://www.kegg.jp/pathway/ko03250')],
)
},
"HIV1": {
"prize_05.tsv": CacheItem(
name="HIV_05 prizes",
cached="https://drive.google.com/uc?id=1jVWNRPfYkbqimO44GdzXYB3-7NXhet1m",
online="https://raw.githubusercontent.com/gitter-lab/hiv1-aurkb/ac9278d447e4188eea3bf4b24c4c4e0c19b0c6d9/Results/base_analysis/prize_05.csv"
),
"prize_060.tsv": CacheItem(
name="HIV_060 prizes",
cached="https://drive.google.com/uc?id=1Aucgp7pcooGr9oT4m2bvYEuYW6186WxQ",
online="https://raw.githubusercontent.com/gitter-lab/hiv1-aurkb/ac9278d447e4188eea3bf4b24c4c4e0c19b0c6d9/Results/base_analysis/prize_060.csv"
)
},
"iRefIndex": {
# This can also be obtained from the SPRAS repo
# (https://github.com/Reed-CompBio/spras/blob/b5d7a2499afa8eab14c60ce0f99fa7e8a23a2c64/input/phosphosite-irefindex13.0-uniprot.txt).
Expand Down
4 changes: 2 additions & 2 deletions configs/dmmm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,12 @@ datasets:
# TODO: use old paramaters for datasets
# HIV: https://github.com/Reed-CompBio/spras-benchmarking/blob/0293ae4dc0be59502fac06b42cfd9796a4b4413e/hiv-benchmarking/spras-config/config.yaml
- label: dmmmhiv_060
node_files: ["processed_prize_060.txt"]
node_files: ["processed_prizes_060.txt"]
edge_files: ["../raw/phosphosite-irefindex13.0-uniprot.txt"]
other_files: []
data_dir: "datasets/hiv/processed"
- label: dmmmhiv_05
node_files: ["processed_prize_05.txt"]
node_files: ["processed_prizes_05.txt"]
edge_files: ["../raw/phosphosite-irefindex13.0-uniprot.txt"]
other_files: []
data_dir: "datasets/hiv/processed"
Expand Down
5 changes: 4 additions & 1 deletion datasets/hiv/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
processed
/processed
/pickles
/raw
/intermediate
30 changes: 30 additions & 0 deletions datasets/hiv/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# HIV dataset

This folder contains data pertaining to following paper: **[HIV-1 virological synapse formation enhances infection spread by dysregulating Aurora Kinase B](https://doi.org/10.1371/journal.ppat.1011492)** - Bruce JW, Park E et al. (2023)

The study examines human immune cells responding to viral infection as well as the changes that take place inside the already infected cells, which is the focus here.
The data is from protein abundance and phosphorylation experiments, which will be the input to pathway reconstruction.

**Overarching goal:** Recreate published biological case study on HIV data using SPRAS. This will help in identifying nodes i.e. proteins that are relevant to the disease.

## Raw files

Follow the `Snakemake` directive to find the fetched URLs for these.

- `prize_05.tsv`: Prizes files from HIV expressing Jurkat cells grown for 5 minutes, from the original paper above.
- `prize_060.tsv`: Prizes files from growing for 60 minutes.
- `ko03250.xml`: KEGG Orthology Pathway ID 03250 (currently unused - was used previously for an attempt at gold standard generation.)
- `HUMAN_9606_idmapping.tsv`: File provided by UniProt, used for mapping UniProt identifiers for `name_mapping.py`.
- `phosphosite-irefindex13.0-uniprot.txt`: The background interactome from the now-gone iRefIndex.

## File organization

See `Snakefile` for the way that all of the IO files are connected.

1. `prepare.py` - This cleans up the prize files in `raw`; specifically to remove duplicates, and to prepare the list of UniProt nodes to be mapped by `name_mapping.py`
1. `name_mapping.py` - Converts from UniProt KB-ACID to UniProt KB to meet in the middle with `kegg_ortholog.py`, and to match with the proteins for the iRefIndex interactome. We chose UniProt KB for its generality. We also remove identifiers with an `-N` suffix and remove duplicates, to make sure isoforms aren't considered as distinct during pathway reconstruction.
1. `spras_formatting.py` - Formats the input files into the universal SPRAS format.

> [!NOTE]
> This dataset does not have a gold standard. There was a prior attempt [see original](../README.md) to use KEGG as the gold standard,
> but there was very little overlap between the nodes generated from this paper and the current KEGG HIV pathway.
25 changes: 0 additions & 25 deletions datasets/hiv/Scripts/Data_Prep.py

This file was deleted.

61 changes: 0 additions & 61 deletions datasets/hiv/Scripts/Kegg_Orthology.py

This file was deleted.

193 changes: 0 additions & 193 deletions datasets/hiv/Scripts/Name_Mapping.py

This file was deleted.

28 changes: 0 additions & 28 deletions datasets/hiv/Scripts/SPRAS_Formatting.py

This file was deleted.

Loading
Loading