by Jason Ernst, Heather L. Plasterer, Itamar Simon, and Ziv Bar-Joseph
Abstract
Information about the binding preferences of many transcription factors is known and characterized by a sequence binding motif. However, determining regions of the
genome in which a transcription factor binds based on its motif is a challenging problem particularly in species with large genomes, since there are often many
sequences containing matches to the motif but are not bound. Several rules based on sequence conservation or location relative to a transcription start site have been
proposed to help differentiate true binding sites from random ones. Other evidence sources may also be informative for this task. We developed a method for integrating
multiple evidence sources using logistic regression classifiers. Our method works in two steps. First we infer a score quantifying the general binding preferences of
transcription factor binding at all locations based on a large set of evidence features, without using any motif specific information. We then combined this general
binding preference score with motif information for specific transcription factors to improve prediction of regions bound by the factor. Using cross-validation and new
experimental data we show that, surprisingly, the general binding preference can be highly predictive of true locations of transcription factor binding even when no
binding motif is used. When combined with motif information our method outperforms previous methods for predicting locations of true binding.
Grid of top 1000 TSS predictions for each JASPAR and TRANSFAC PWM
that the corresponding TF(s) will bind within +-10K from the TSS is here.
The gene symbols(s) corresponding to the TSS is found in the left-most column.
If a prediction is in the top 1000 it has a '1', otherwise a '0'. File with the TSS hg18 coordinates also included is
here.
Grid with the relative ranking of each TSS for each JASPAR and TRANSFAC PWM
that the corresponding TF(s) will bind within +-10K from the TSS is here.
File with TSS hg18 coordinates also included is here .
The TF(s) corresponding to each of the PWMs in the above file can be found
here.