Get Started
Epcotv2 is a general-purpose foundation model for integrative genomic prediction, capable of inferring diverse regulatory modalities from minimal input data. Unlike task-specific models, this architecture unifies transcriptional and chromatin-level outputs within a shared multi-task framework.
Model Inputs
The model takes the following as input:
DNA sequence: One-hot encoded sequence centered on each genomic region of interest
ATAC-seq signal: Chromatin accessibility profiles for the same regions
Both inputs are designed to be cell-type specific and jointly encode local cis-regulatory information.
Predicted Modalities
The model is trained in a multi-task setting to directly predict a broad range of genomic and epigenomic modalities from DNA sequence and ATAC-seq input. These include:
Nascent transcription signals, such as PRO-seq, GRO-seq, TT-seq, and related strand-specific assays
Total and polyA+ RNA-seq, capturing steady-state expression
Chromatin accessibility and histone modifications, covering a wide panel of epigenomic marks
3D chromatin organization, including Micro-C, Hi-C, and ChIA-PET contact maps
Transcription initiation signals, such as CAGE-seq and GRO-cap
Synthetic enhancer activity, including STARR-seq
A complete list of supported modalities can be found in the repository under data/epi_list and data/extra_tf_list.txt.
All outputs are aligned to a shared genomic coordinate system, enabling consistent evaluation and downstream analysis.
Cross-Context Generalization
The model is trained across a large number of human cell types and tissues. It generalizes well to unseen biological contexts, and supports cross-species transfer.
This model can be used for:
Predicting transcriptional output from accessible chromatin regions
Characterizing regulatory elements in new tissues with only ATAC-seq and sequence
Interpreting the potential functional role of non-coding genetic variants
Reconstructing 3D chromatin architecture in data-limited settings
Installation
To set up the model locally for inference or fine-tuning, follow the steps below:
git clone https://github.com/liu-bioinfo-lab/general_AI_model.git
cd general_AI_model
# setup environment
conda create -n epcot python==3.9
conda activate epcot
pip install -r requirements.txt
One can refer to epcotv2_basic_tutorial.ipynb for the basic usage of this model.