Additional Supplementary Material
A Transcription Factor Affinity Based Code for Mammalian Transcription Initiation
Data Sets
All Single-Peak locations (TSSs from the set of 2399 CAGE Tag Clusters from which all training and test sets were derived):
Fasta file (gzipped) containing sequence from 5kb upstream to 5kb downstream of the TSS
mouse_ALL_single_TSS_TCs_-5000_5000.fa.gz
Training Sets (CAGE Tag Cluster IDs):
Annotation-Supported_TrainingSet.ids.txt
CAGE-Only-Supported_TrainingSet.ids.txt
CpG-Island_TrainingSet.ids.txt
Non-CpG-Island_TrainingSet.ids.txt
Test Sets (CAGE Tag Cluster IDs):
Annotation-Supported_TestSet.ids.txt
CAGE-Only-Supported_TestSet.ids.txt
Cross-validation Sets:
Each archive directory (tar-gzip) contains three subdirectories, TSS_set, IGC_set, and CDS_set for the positive data, negative intergenic data, and negative cds data respectively. Each subdirectory contains 10 fasta files representing the 10 parts. Sequence for each example location is taken from (-250, +50) with respect to the example location. Note that because TSS and corresponding upstream intergenic examples must be extracted from mm5 (the genome build of the original CAGE Tag mappings), sequences will occasionally contain "N's" at nucleotides which were not yet identified in the build.
Annotation-Supported_Crossval.tgz
CAGE-Only-Supported_Crossval.tgz
Notes:
A descriptor file containing the Tag Cluster IDS, genomic locations for the highest TSS in a cluster, and other detailed information can be downloaded at http://fantom31p.gsc.riken.jp/cage_analysis/export/mm5/tss_summary.tsv.bz2.
Genomic Scans
UCSC Custom Tracks for all Annotation-Supported Model Test Set Scans:
Annotation-Supported-Model-TestSet_scans_UCSC_custom_tracks
UCSC Custom Tracks for all pri-miRNA Scans:
Annotation-Supported-Model-miRNA_scans_UCSC_custom_tracks
UCSC Custom Tracks for non-genic miRNA regions scans from the Marson Data Set (Marson et al., 2008):
Annotation-Supported-Model-Marson-miRNA_scans_UCSC_custom_tracks
Notes:
Clicking on the UCSC Custom Track listed in each category will automatically open the track for viewing in a new UCSC Genome Browser window. Genome build and other browser settings are appropriately pre-selected when each link is clicked (mm5 for Test Set Scans, hg18/mm9 for pri-miRNA Scans, mm8/mm9 for Marson Data Set miRNA Region Scans).