Assessing the utility of thermodynamic features for microRNA target prediction under relaxed seed and no conservation requirements
P. Lekprasert, M. Mayhew, U. Ohler (2011). Assessing the utility of thermodynamic features for microRNA target prediction under relaxed seed and no conservation requirements. PLoS ONE.
Many computational microRNA target prediction tools are focused on several key features, including complementarity to miRNA seeds and evolutionary conservation. While these features allow for successful target identification, not all miRNA target sites are conserved and adhere to canonical seed complementarity. Several studies have propagated the use of energy features of mRNA:miRNA duplexes as an alternative features. However, different independent evaluations reported conflicting results on the reliability of energy-based predictions. Here, we reassess the usefulness of energy features for mammalian target prediction, aiming to relax or eliminate the need for perfect seed matches and conservation.
We detect significant differences of energy features at experimentally supported human miRNA target sites and at genome-wide sites of AGO protein interaction. This trend is confirmed on datasets that assay the effect of miRNAs on mRNA and protein expression changes, and a simple linear regression model leads to significant correlation of predicted versus observed expression change. Compared to 6-mer seed matches as baseline, application of our energy-based model leads to ~3-5 fold enrichment on highly down-regulated targets, and allows for prediction of strictly imperfect targets with enrichment above baseline.
In conclusion, our results indicate significant promise for energy-based miRNA target prediction that includes a broader range of targets without having to use conservation or impose stringent seed match rules.
The TargetThermo package uses an energy-based linear regression model to compute predicted expression fold change of a gene caused by a miRNA. The program does not use conservation in order to allow for predicton of non-conserved targets. First, the program searches for miRNA match sites in the 3'UTR. Initial match sites are any perfect 4-mer matches to miRNA seed since we would like to include possible non-canonical target sites as well. Next, the miRNA:mRNA duplex energy is calculated for each site. The duplex energy includes disruption energy (energy used to open up the local mRNA structure), and hybridization energy (binding energy between miRNA and open mRNA.
A seed type filter is used to screen for sites when computing the expression fold change values to reduce noises in the prediction. With a 6-in-8 filter, only match site with at least 6 base pairs/1A are used, and with a 6-mer filter, only sites with consecutive 6 base pairs are used. The linear regression prediction model was trained on Selbach et al data, as described in the paper.
TargetThermo includes the program, along with sample inputs, and results.
TargetThermo requires a Vienna RNA package version 1.7.2 with a program addition (download), which has to be installed on your machine (see INSTALL in the package). Also, shell variables must be specified in .bashrc, .cshrc, or an appropriate file for targetThermo folder and Vienna RNA binary folder (bin/). See examples below.
setenv TGTMPATH /codes/targetThermo/
setenv VIENNAPATH /bin/ViennaRNA-1.7.2/install/bin/
perl runtargetprediction.pl -miFile example/testmi.fa -selUTRFile example/testsel.fa
-UTRFile example/testUTR.fa -CDSFile example/testCDS.fa
Required command line arguments:
- UTRFile [name]
- CDSFile [name]
3'UTR and CDS portion of the genes are in two separate input files (fasta format). The program will search for an initial 4-mer match that only in the UTR. CDS portion of the gene is needed for some energy computation especially when the initial 4-mer matches lie close to the 5' end of the UTR. It is not required that CDSFile contain sequences to all corresponding 3'UTR. CDSFile must exist.
- miFile [name]
contains miRNA sequences to be used in the predictions (fasta format).
Optional command line arguments:
- stFilter [1,2]
seed type filter: only sites that have at least this seed type are used to compute predicted expression fold change. Default is 1 = using 6-in-8 site filter; 2 = 6-pf site filter.
- selUTRFile [name]
contains a list of IDs of sequence to be searched for matches to a miRNA. It's in a fasta format where miRNA names are identifiers, and following lines are each ID name (see below). If no file is specified, all sequences in the UTRFile will be used for each miRNA.
- outFile [name]
Output file (default is predFCresult.txt) will include miRNA, gene ID, and predicted fold change value in a tab-delimited format, sorted by predicted fold change values for each miRNA.