š§¬ Algorithms for DNA Sequencing by Johns Hopkins University
def QtoPhred33(Q):
"""Turn Q into Phred+33 ASCII-Āāencoded quality"""
return chr(Q + 33) # converts character to integer according to ASCII table
def phred33ToQ(qual):
"""Turn Phred+33 ASCII-encoded quality into Q"""
return ord(qual) - 33 # converts integer to character according to ASCII table
In lecture and in a practical, we saw an implementation of the naive exact matching algorithm:
def naive(p, t):
occurrences = []
for i in range(len(t) - len(p) + 1): # loop over alignments
match = True
for j in range(len(p)): # loop over characters
if t[i+j] != p[j]: # compare characters
match = False
break
if match:
occurrences.append(i) # all chars matched; record
return occurrences
ā¦and we saw a function that takes a DNA string and returns its reverse complement:
def reverseComplement(s):
complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A', 'N': 'N'}
t = ''
for base in s:
t = complement[base] + t
return t
First, implement a version of the naive exact matching algorithm that is strand-aware. That is, instead of looking only for occurrences of P in T, additionally look for occurrences of the reverse complement of P in T. If P is ACT, your function should find occurrences of both ACTand its reverse complement AGT in T.
If P and its reverse complement are identical (e.g. AACGTT), then a given match offset should be reported only once. So if your new function is called naive_with_rc, then the old naive function and your new naive_with_rc function should return the same results when P equals its reverse complement.
Next, download and parse the lambda virus genome.