Megraw M, Pereira F, Jensen ST, Ohler U, Hatzigeorgiou AG. (2009). A transcription factor affinity based code for mammalian transcription initiation. Genome Research, 19:644-56.
About this study:
Recent Cap Analysis of Gene Expression (CAGE) studies have fundamentally altered our understanding of RNA pol-II promoters by demonstrating convincingly that initiation events are not limited to one or just a few single locations. Rather, regions of frequent initiation take on a variety of distinct shapes. A large fraction of transcription start sites (TSSs) however exhibit peaks where CAGE sequence tags cluster around a particular location (~45%). The presence of a strong single peak suggests that transcription at these locations may be mediated by position specific sequence features. We have therefore developed a new model for single-peaked TSSs based solely on known transcription factors and their respective regions of positional enrichment. Computational prediction of precise TSS locations has been regarded as notoriously difficult, but our model demonstrates that TSS prediction with both high accuracy and spatial resolution is achievable, while at the same time being highly interpretable. When applied to known and putative miRNA promoter regions, the method predicts that about 70% of these use a single-peak pol-II TSS. In light of new evidence that processed RNA transcripts may undergo a CAGE-detectable re-capping process (ENCODE Transcriptome Project et al., Nature, Feb 2009) , our model offers a unique starting point for distinguishing between TSSs and re-capped products.
We incorporated the model above (referred to as the Annotation-Supported model in the publication text) into an open-source tool available below, called "S-Peaker" (Single-Peak TSS finder). The tool uses the model to identify probable locations for single-peak TSSs in an input sequence. Specifically, one can provide a FASTA sequence to the tool and obtain a UCSC custom track that displays the probability of observing a Single-Peak TSS at each position in the sequence.