Explainable Convolutional Neural Network Model Provides an Alternative Genome-Wide Association Perspective on Mutations in SARS-CoV-2
Code repository of the paper https://arxiv.org/abs/2410.22452
We compared an explainable CNN and traditional GWAS to identify SARS-CoV-2 mutations linked to WHO-labeled Variants of Concern (VOCs). The CNN, combined with SHAP, outperformed GWAS in identifying key nucleotide substitutions, particularly in spike gene regions. Our findings highlight the potential of explainable neural networks as an alternative to traditional genomic analysis methods.
- Create a Virtual Environment
python3 -m venv myenv
- Activate the Virtual Environment
source myenv/bin/activate
- Install Packages from
requirements.txt
pip3 install -r requirements.txt
- If you want to run on cluster use
slurm.sh:- Update the SLURM directives (e.g., #SBATCH options) to match the specifications of your cluster.
- Activate your virtual environment:
source activate myenv - Uncomment the scripts you want to execute: For example, to run the Venn SHAP calculation, you would leave this line uncommented:
python3 ./gene_shap/venn_shap.py -num 1500 -agg - Submit the job:
sbatch slurm.shNotes: Output Files: The script will create an output file in the./outs/directory
- If you want to run without cluster:
- Activate the environment with:
source myenv/bin/activate - Run the desired Python script from root of repository, for example:
python3 ./gene_shap/venn_shap.py -num 1500 -agg
- Activate the environment with:
- Train model:
./training/ - Find initial sequences in each VOCs:
./first_sequences/ - Analysis SHAP value for two-layer CNN model:
./gene_shap/ - Compare SHAP value with GWAS:
./gwas/ - Train decision tree models and produce their SHAP values:
./decision_tree/
** Optional: Find MSA within each variant: ./msa_gene/
Note: All results will be saved in ./results/