Skip to content

louiske65/CRE-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Predicting CRE-Gene Interactions from DNA Sequence by Fine-Tuning Enformer with Single-Cell Multiome Data

This repository contains the data preparation and training scripts. It omits the feature_linkages folder, hg38.pk, and gencode.v32.annotation.gtf due to GitHub space constraints.

These files and the processed data are available in Google Drive at this link: https://drive.google.com/drive/folders/1u6fTEUJmviggkTk2OXMYRfot0fvLfZj8?usp=drive_link

Although deep learning models like Enformer have achieved excellent performance in predicting a variety of genomic tasks, they struggle to accurately model genetic expression variation across individuals. This suggests a gap in the model's fundamental understanding of how cis-regulatory elements and genes interact. To address this, we hypothesized that training a model to explicitly predict CRE-gene linkages would improve its regulatory understanding. In this study, we fine-tuned Enformer using the 10x Genomics Human PBMC Single-Cell Multiome dataset, leveraging paired chromatin accessibility and gene expression data to generate high-confidence peak-gene linkages. We implemented a supervised learning approach with a balanced mean squared error loss function and added explicit distance tracks to the model input. We compared a linear probe baseline against a model where the last transformer layer was fine-tuned. While the baseline model failed to localize regulatory elements ($r=0.1296$), our fine-tuned model achieved a significant improvement in Pearson correlation on peak regions ($r=0.6168$). These results demonstrate that while Enformer may possess a rich understanding of a variety of genomic tasks, specific fine-tuning strategies are essential for adapting the model to better understand gene regulation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published