Get Started


Epcotv2 is a general-purpose foundation model for integrative genomic prediction, capable of inferring diverse regulatory modalities from minimal input data. Unlike task-specific models, this architecture unifies transcriptional and chromatin-level outputs within a shared multi-task framework.

Overview of the multi-task genomic model

Model Inputs

The model takes the following as input:

  • DNA sequence: One-hot encoded sequence centered on each genomic region of interest

  • ATAC-seq signal: Chromatin accessibility profiles for the same regions

Both inputs are designed to be cell-type specific and jointly encode local cis-regulatory information.

Predicted Modalities

The model is trained in a multi-task setting to directly predict a broad range of genomic and epigenomic modalities from DNA sequence and ATAC-seq input. These include:

  • Nascent transcription signals, such as PRO-seq, GRO-seq, TT-seq, and related strand-specific assays

  • Total and polyA+ RNA-seq, capturing steady-state expression

  • Chromatin accessibility and histone modifications, covering a wide panel of epigenomic marks

  • 3D chromatin organization, including Micro-C, Hi-C, and ChIA-PET contact maps

  • Transcription initiation signals, such as CAGE-seq and GRO-cap

  • Synthetic enhancer activity, including STARR-seq

A complete list of supported modalities can be found in the repository under data/epi_list and data/extra_tf_list.txt.

All outputs are aligned to a shared genomic coordinate system, enabling consistent evaluation and downstream analysis.

Cross-Context Generalization

The model is trained across a large number of human cell types and tissues. It generalizes well to unseen biological contexts, and supports cross-species transfer.

This model can be used for:

  • Predicting transcriptional output from accessible chromatin regions

  • Characterizing regulatory elements in new tissues with only ATAC-seq and sequence

  • Interpreting the potential functional role of non-coding genetic variants

  • Reconstructing 3D chromatin architecture in data-limited settings

Installation

To set up the model locally for inference or fine-tuning, follow the steps below:

git clone https://github.com/liu-bioinfo-lab/general_AI_model.git
cd general_AI_model
# setup environment
conda create -n epcot python==3.9
conda activate epcot
pip install -r requirements.txt

One can refer to epcotv2_basic_tutorial.ipynb for the basic usage of this model.