-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Dear Author,
I hope this email finds you well. I am reaching out because I am interested in using TACIT to train a single-species model inspired by the approach in your paper, "Relating enhancer genetic variation across mammals to complex phenotypes using machine learning," particularly the MouseMotorCortexModel.
My Data and Goals
Multi-species MAF file: I have a whole-genome alignment (MAF format) for 12 rodent species.
ATAC-seq data: For only one of these species, I have ATAC-seq peaks from three tissues.
Objective: Train a tissue-specific model (similar to MouseMotorCortexModel) to predict regulatory elements and link them to phenotypes.
Questions
After reviewing the TACIT Guidelines and GitHub documentation, I still have a few uncertainties:
ATAC Data Alignment for Open Chromatin Regions
Should I map my ATAC-seq reads only to my target species’ genome, or should I align them to all species in the MAF?
Generating Positive/Negative Sets
Once I define open chromatin regions, are there TACIT scripts to help generate positive (enhancers) and negative (non-enhancers) sets?
Model Training with Single-Species Data
The GitHub documentation suggests starting with ocr_phylo(g)lm.r to test OCR-phenotype associations, but I only have ATAC peaks from one species. Can I still train a model without phylogenetic correction?
If so, what would be the recommended workflow? (e.g., train a baseline model first, then refine with cross-species data later?)
My ultimate goal is to link predicted enhancers to phenotypes, but I’m unsure how to proceed with only single-species ATAC data.
I would greatly appreciate any guidance or suggestions you might have. Thank you for your time and for developing this fantastic tool!
Best regards,
Na Wan