Python - Print horizontally two strings, with | -
I'm having a small formatting problem that I can not solve. I have some long strings, in the form of DNA sequences, I have added each to a separate list, in which each person gets separate letters in each list. They are of uneven length, so I added "N" to the smallest of two.
ex:
Seek 1 = ['A', 'T', 'G', 'G', 'A', 'C', 'G', 'C 'A'] seq2 = ['A', 'T', 'G', 'G', 'C', 'T', 'G']
became seq2: ['A Currently, the number of letters in each list is comparable to that of the letter, 't', 'g', 'g', 'c', 't', 'g', 'n', 'n']
After getting I:
ATGG - G -
Where there is a mismatch in '-' characters (included "N").
Ideally what I would like to print:
seq1 atggcaa ||||||||||| Seq2 ATGG - G--
I'm playing around with the new line letters at the end of the print statement, although I can not get it to work. I would like to print an identifier for each on the same line because this is the order.
def Align_seqs (ORF, query): Here is the function used to two seqs than orf_base = list (ORF) query_base = list (query) If LEN (Query_base) & gt; Lane (orf_base): N = - range (n) (lane (Query_base) Lane (Orf_base)) for i: orf_base.append ( "N") Elif Lane (Query_base) of & lt; Lane (orf_base): N = - in range (len (Orf_base) Lane (Query_base)) i (n): query_base.append ( "N") range I align = [] to (0, len (Orf_base )): If orf_base [i] == query_base [i]: align.append (orf_base [i]) and: align.append ( "-") print '' .join (align)
At the present time, I'm just printing the "bottom" portion of the print.
All help is appreciated.
So, it is a solution for you who work with long strings:
s1 = 'atagagatagagatatagagatagagatagag' s2 = 'a-Ahga-next-Agagaga-Agaga-next-Agagaga-Agaga-Agaga-Aajiji' both the sequence of equal length (post-alignment) def print_align (seq1, seq2 , Length): while Lennon (seq1) & gt; 0: print "seq1:" + seq1 [: length-6] print "" + '' | * LEN (Seql [: length -6]) print "seq2:" + seq2 [: length -6] + "\ N" Seql = Seql [length -6:] seq2 = seq2 [length -6:] print_align (s2 , S 1, 30)
produces:
SAC 1: Attgagtaggetagtaagaji ||||||||||||||| |||||||||||||||||| Sek2: A-Agga-Agga-Agga-Agg Sekl: Attggtaggtaggtagg |||||||||||||||||||||||| Seq2: A-Anga-Anggo-Anggo-AAGG SAC1: Ataggaitag ||||||||||||||||| Seq2: A-Anga-AAGG
Which I believe is what you want. You can play with the length
parameter to get the lines to display properly (each line is cut after reaching the specified length by that parameter). For example, if I print_align call (S2, S 1, 39)
I get:
seq1: ATAAGGATAAGGATAAGGATAAGGATAAGGATA |||||||||| ||| ||||||||||||||||||||||||| seq2: A-AAGGA-AAGGA-AAGGA-AAGGA-AAGGA- A Seql: AGGATAAGGATAAGGATAAGGATAAGG ||||||||||||||||||||||||||| Seq2:. Agga-AAGGA-AAGGA-AAGGA-AAGG
This is a much more cost effective results you must try with huge (> 1000bp) sequences
Note The function takes two scenes as same length in the form of input, so it's just printed it well after you have all the align work done.
PS Normally, in sequence alignment, > match bar for nucleotide |
Displays its solution very easy and you should be able to understand it (tell me if you have trouble).
Comments
Post a Comment