Update training workflow to handle CRISPR data with multiple cell types#85
Update training workflow to handle CRISPR data with multiple cell types#85
Conversation
kaybrand
left a comment
There was a problem hiding this comment.
I like how you pulled the model directories out of the dataset directories. It is also nice that you supply a single merged CRISPRi training dataset to the train_model rule, using the CellType column to combine distinct datasets.
These are my suggestions from my first read-through:
In line 58 of the README, please correct 'saples' to 'samples'.
You refer several times to 'ct' and 'cd' in variable names. Renaming them to 'cell_type' and 'crispr_dataset' (or another name) would make your code more readable.
On line 19 of Snakefile_training, you set config["results_dir"] to be an absolute path. But if this were already an absolute path, the results dir might end up looking something like /oak/stanford/groups/engreitz/Users/kaybrand/ENCODE_rE2G/oak/stanford/groups/engreitz/Users/kaybrand/ENCODE_rE2G/results. I advise checking if the path starts from the root, checking if directory exists, and creating it if it does not, then saving the path to config["results_dir"].
fix indexing error in utils
…set by two different environment variables
…ounting capability
…ut requires 70-80% more RAM
improve efficiency of calculating num TSS btw E&G + black formatting adjustments
No description provided.