Tiresias

Computational framework for disease gene prediction through supervised learning on multiplex biological networks.

Version

Tiresias is still under development. The present software is available in beta.

Getting started

This software has been tested on Ubuntu 18.04. It is meant to be run on Linux systems.

Prerequisites

Conda
Docker

Installing

Clone the repository and get into the Git directory.

git clone https://github.com/RausellLab/Tiresias.git && cd Tiresias

Setup a Conda environment and activate it.

conda env create -f environment.yml && conda activate tiresias

Pull node2vec Docker image.

make node2vec-image

If you get an error at this point, the first thing to do is to check that Docker is configured properly. See "Manage Docker as a non-root user".

You can now check that the framework works properly by launching the data pipeline with dummy input data:

make pipeline

You can then browse the results with MLFlow UI. Launch:

mlflow ui

Then, go to http://localhost:5000/ in your web browser.

Once you're done, clean up the intermediary and output files created by the pipeline:

make clean-all

Usage

Configuration

Datasets

Input datasets paths must be entered in config.yml.

Parameters

Set up the parameters used for featurization and for running the models by modifying the JSON files located in parameters/.

parameters/features.json: contains the parameters used to run random walks and to create node embeddings.
parameters/models_validation.json: contains model parameters used during the validation stage.
parameters/models_test.json: contains model parameters used during the test stage. Note that those parameters are also used for outputting a final ranking of the nodes during the predict stage.

System and models

System resource configuration and enabling/disabling the use of models is done by modifying config.yml.

Run pipeline

Before running any pipeline step, the Conda environment must be activated with:

conda activate tiresias

The pipeline can be run from start to finish with a single command:

make pipeline

It is also possible to run the pipeline steps individually.

Data preprocessing

make data

Random walks

make random-walks

Node embeddings

make embeddings

Validation

make validation

Test

make test

Predict

make predict

Visualize results with MLFlow

Model run results are recorded using MLFlow. Results are stored in the mlruns/ directory.

You can browse results with MLFlow UI. To do so, launch MLFlow UI.

mlflow ui

Then, go to http://localhost:5000/ in your web browser.

Reports

At the validation, test and prediction steps of the pipeline, reports containing recaps of the results are generated. The report files are available in reports/.

Intermediary files

Intermediary files generated when running pipeline steps and their associated metadata are stored in artifacts/.

Note that the pipeline scripts automatically pick-up the files present in artifacts/ as inputs for downstream pipeline steps. This allow for more flexibility when running pipeline steps separately. However, you may want not to repeat some experiments or start with completely different data or parameters. In this case, move or delete the existing artifacts/ directory, then re-run the pipeline from start with the new inputs.

Built with

References

Mordelet, F. and Vert, J.-P. (2011). ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC Bioinformatics, 12(1):389.

Mordelet, F. and Vert, J.-P. (2014). A bagging SVM to learn from positive and unlabeled examples. Pattern Recognition Letters, 37:201–209.

Lovász, László, et al. Random walks on graphs: A survey. Combinatorics, Paul erdos is eighty, 1993, vol. 2, no 1, p. 1-46.

Zhou, D., Bousquet, O., Lal, T. N., Weston, J., and Schölkopf, B. (2004). Learning with local and global consistency. In Advances in Neural Information Processing Systems 16, pages 321–328. MIT Press

Li, Y. and Li, J. (2012). Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data. BMC Genomics, 13(7):S27.

Valdeolivas, A., Tichit, L., Navarro, C., Perrin, S., Odelin, G., Levy, N., ... & Baudot, A. (2018). Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics, 35(3), 497-505.

Grover, A. and Leskovec, J. (2016). node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, pages 855–864, San Francisco, California, USA. ACM Press.

T. N. Kipf and M. Welling, “Semi-Supervised Classification with Graph Convolutional Networks,” arXiv:1609.02907 [cs, stat], Sep. 2016.

M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling, “Modeling Relational Data with Graph Convolutional Networks,” arXiv:1703.06103 [cs, stat], Mar. 2017.

Licence

Tiresias uses Apache License 2.0.

Contact

Please address comments and questions about Tiresias to thibaud.martinez@gmail.com, stefani.dritsa@institutimagine.org, chloe-agathe.azencott@mines-paristech.fr and antonio.rausell@inserm.fr.

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
data		data
parameters		parameters
src		src
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
config.yml		config.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tiresias

Version

Getting started

Prerequisites

Installing

Usage

Configuration

Run pipeline

Visualize results with MLFlow

Reports

Intermediary files

Built with

References

Licence

Contact

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

RausellLab/Tiresias

Folders and files

Latest commit

History

Repository files navigation

Tiresias

Version

Getting started

Prerequisites

Installing

Usage

Configuration

Run pipeline

Visualize results with MLFlow

Reports

Intermediary files

Built with

References

Licence

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages