In our TP53 example - 400 nucleotides are amplified to generate +77000 k-mers ("subsequences"), a total of 10,735,712 nucleotides.
Example: each of 4 mRNA or protein sequences [P1-4] of a cell/gene
From this we derive Codondex i-Score and two varieties of Protein Vector, which expose fine distinctions between k-mers of same-gene transcripts using intron-protein/mRNA pairs.
We also discovered and ranked thousands of statistically dominant k-mers in multiple gene transcripts. These are unrelated by their sequence text, are of equal length and recur with equal frequency. This symmetry is unrelated to reverse complements and inverted repeats, which we can precisely predict using k-mer recurrence data exclusively.
Sign up, upload and we will make available several reports to identify target subsequences of interest. See the processed examples on our site.