Threshold for Positional Weight Matrix

  1. (PDF, 482 KB)
  2. Get@NRC: Threshold for Positional Weight Matrix (Opens in a new window)
AuthorSearch for: ; Search for:
Journal titleEngineering Letters
Pages498504; # of pages: 7
SubjectPositional Weight Matrix; Threshold; Statistical expectation; Goodness-of-fit; Sequence motif
AbstractIn biological sequence research, the positional weight matrix (PWM) is often used to search for putative transcription factor binding sites. A set of experimentally verified oligonucleotides known to be functional motifs are collected and aligned. The frequency of each nucleotide A, C, G, or T at each column of the alignment is calculated in the matrix. Once a PWM is constructed, it can be used to search from a nucleotide sequence for subsequences that can possibly perform the same function. The match between a subsequence and a PWM is usually described by a score function, which measures the closeness of the subsequence to the PWM as compared with the given background. Nevertheless, the score function is usually motif-length-dependent and thus there is no universally applicable threshold. In this paper, we propose an alternative scoring index (G) varying from zero, where the subsequence is not much different from the background, to one, where the subsequence fits best to the PWM. We also propose a measure evaluating the statistical expectation at each G index. We investigated the PWMs from the TRANSFAC and found that the statistical expectation is significantly (p<0.0001) correlated with both the length of the PWMs and the threshold G value. We applied this method to two PWMs (GCN4_C and ROX1_Q6) of yeast transcription factor binding sites and two PWMs (HIC1-02, HIC1_03) of the human tumor suppressor (HIC-1) binding sites from the TRANSFAC database. Finally, our method compares favorably with the broadly used Match method. The results indicate that our method is more flexible and can provide better confidence.
Publication date
AffiliationNational Research Council Canada (NRC-CNRC); NRC Institute for Information Technology
Peer reviewedYes
NRC number52543
NPARC number16474617
Export citationExport as RIS
Report a correctionReport a correction
Record identifier7cfef343-d3a2-4b5c-a50f-8f6a02151c33
Record created2010-12-02
Record modified2016-05-09
Bookmark and share
  • Share this page with Facebook (Opens in a new window)
  • Share this page with Twitter (Opens in a new window)
  • Share this page with Google+ (Opens in a new window)
  • Share this page with Delicious (Opens in a new window)