McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes
D. Hafez, A. Karabacak, S. Krueger, YC Hwang, LS Wang, RP Zinzen, U. Ohler (2017) McEnhancer: predicting gene expression via semi-supervised assignment of enhancers to target genes Genome Biology.
Schematic representation of McEnhancer. Starting with known DHS-gene pairs assigned to a given cluster, it builds a third-order IMM to represent sequence features in DHSs for the positive cluster (colored red). It builds another IMM to represent the null model. In iterative rounds, the model loops on all unlabeled DHSs, calculates log likelihood ratio, and assigns a class label to each unlabeled DHS accordingly. After it finishes looping on all unlabeled DHSs, it adds the newly assigned positive DHSs to the positive cluster and re-estimates IMM parameters.
Transcriptional enhancers regulate spatio-temporal gene expression. While genomic assays can identify putative enhancers en masse, assigning target genes is a complex challenge. We devised a machine learning approach, McEnhancer, which links target genes to putative enhancers via a semi-supervised learning algorithm that predicts gene expression patterns based on enriched sequence features. Predicted expression patterns were 73-98% accurate, predicted assignments showed strong Hi-C interaction enrichment, enhancer-associated histone modifications were evident, and known functional motifs were recovered. Our model provides a general framework to link globally identified enhancers to targets and contributes to deciphering the regulatory genome.
Data, code and analysis results for the assignment of enhancers to target genes.
The code can be found at https://github.com/dinahafez/McEnhancer.