ButcherPy is the Python (re)implementation of the R package ButchR. This package provides functions to perform non-negative matrix factorization (NMF) using Tensorflow.
You can use ButcherPy to obtain molecular signatures from various types of data - such as gene expression or epigenomics datasets. This software also provides intuitive interpretation of results through extensive visualization options.
If you have access to Bioquant's workstation, follow these steps:
- Clone this repository:
git clone https://github.com/hdsu-bioquant/ButcherPy.git
- Start the
butcherpyDocker container. To do that replacedevice=1with the device ID that has available memory, and substituteusernamewith your Bioquant username in the following command. For information on checking GPU availability, refer to DGX Workstation Readme.docker run --gpus device=1 -p 8888:8888 --rm -ti -v /raid/username/projects/:$HOME pytorch:butcherpy - In the running Docker container navigate to the directory where you cloned this repository:
cd your_directory/ButcherPy
At this point, you are ready to use ButcherPy as the Docker container comes pre-installed with all required packages.
Note: The Docker container only includes ButcherPy's dependencies. Additional packages like scanpy (commonly needed for AnnData) are not pre-installed. If you need extra packages, extend the butcherpy Docker image with the required packages. Be sure to save the customized image with a unique name to keep the official butcherpy image unchanged. More details on this process are available in the DGX Workstation Readme.
If you do not have access to the Bioquant GPU workstation: Otherwise, without Bioquant workstation access, you can do the following:
- Clone this repository:
git clone https://github.com/hdsu-bioquant/ButcherPy.git
- Use a python environment of your choice (e.g., Docker, virtualenv, etc.) and install all necessary packages specified in
setup.py. Pay attention to version requirements. - Navigate to the repository directory:
cd your_directory/ButcherPy
You are now ready to use ButcherPy.
To perform NMF and analyze results, import the necessary modules:
import src.butcherPy.multiplerun_NMF_class as nmf_class
import src.butcherPy.nmf_run as nmfAfter preparing your data, run NMF and interpret the results using the NMF class:
nmf_obj = nmf.multiple_rank_NMF(data,
ranks=[3, 4, 5],
n_initializations=10,
iterations=100,
seed=123)
nmf_obj.compute_OptKStats_NMF()
nmf_obj.compute_OptK()
nmf_obj.WcomputeFeatureStats()
save_path="SignatureHeatmap"
nmf_obj.signature_heatmap(save_path)This is just a basic example. For your real-world dataset, consider increasing the number of ranks, initializations, and iterations. Refer to the nmf_vignette for a comprehensive guide on data preparation, usage of analytical functions, and visualizations.
If you use ButcherPy, please cite the original publication of ButchR: Andres Quintero, Daniel Hübschmann, Nils Kurzawa, Sebastian Steinhauser, Philipp Rentzsch, Stephen Krämer, Carolin Andresen, Jeongbin Park, Roland Eils, Matthias Schlesner, Carl Herrmann, ShinyButchR: interactive NMF-based decomposition workflow of genome-scale datasets, Biology Methods and Protocols, bpaa022.