🧬 Algorithms for DNA Sequencing by Johns Hopkins University
https://en.wikipedia.org/wiki/DNA_sequencing
DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.
def read_FAST_A(filename):
genome = ''
with open(filename, 'r') as f:
for line in f:
# ignore header line with genome information
if not line[0] == '>':
genome += line.rstrip()
return genome
def readFAST_Q(filename):
sequences = []
qualities = []
with open(filename) as fh:
while True:
fh.readline() # skip name line
seq = fh.readline().rstrip() # read base sequence
fh.readline() # skip placeholder line
qual = fh.readline().rstrip() # base quality line
if len(seq) == 0:
break
sequences.append(seq)
qualities.append(qual)
return sequences, qualities
$ jupyter notebook 1_notebook.ipynb
[I 11:45:05.991 NotebookApp] The Jupyter Notebook is running at:
[I 11:45:05.991 NotebookApp] http://localhost:8889/?token=070644d6de70204df12235b2356476b577d0744b5df41422
[I 11:45:05.991 NotebookApp] or http://127.0.0.1:8889/?token=070644d6de70204df12235b2356476b577d0744b5df41422
[I 11:45:05.991 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 11:45:05.998 NotebookApp]
To access the notebook, open this file in a browser:
file:///Users/.../Library/Jupyter/runtime/nbserver-38748-open.html
Or copy and paste one of these URLs:
http://localhost:8889/?token=070644d6de70204df12235b2356476b577d0744b5df41422
or http://127.0.0.1:8889/?token=070644d6de70204df12235b2356476b577d0744b5df41422
ipython
. If python
is used
to execute then the following error will occur:NameError: name 'get_ipython' is not defined
K-mers are a fundamental concept for creating “words” from a DNA sequencing read. These “words” are abstracted to computer science string algorithms (ie. simply finding pattern in text).
For example, a DNA substring consisting of two neucleotides is a 2-mer (regardless of Mr. Schwarzenegger’s beliefs):