Biology undergraduates often spend an afternoon in a lab doing a DNA sequence. Molecular biologists have reconstructed the polio virus from materials commonly available from a biological supply house, while other are striving to find good measures of genetic distance between various genomes, and to trace the evolution of specific biomolecules. Still others, with the support of the pharmaceutical industry are using ligand field theory and molecular geometry to find small molecules that will “dock” on specific biosites, in the hope of finding treatments for a variety of diseases, many of which may be caused by irregularities in gene expression. There are even studies aimed at unlocking the feed back mechanism between our cellular DNA (inherited in equal parts from our parents) and our Mitochondrial DNA (inherited only from our mother), and regulating energy production in our cells. Every day, the list of discoveries, and the list of new questions increases. Such topics from DNA sequencing to genome interactions all owe their evolution to discovery of the double-helix geometry of genomes in the 1960s, that led, in turn, to the decoding of the small genomes, and finally the human genome around 2000. The storing, retrieval, and analysis of all this data is the concern of bioinformatics (aka computational molecular biology), which serves as the driving force for conjectures, and future discoveries in genetics, biomedicine, bionanotechnology, and host of other disciplines, some, as yet, unborn. The very words used in describing the content and question base of bioinformatics suggest a close connection between this discipline and biology, physics, computer science, and mathematics. In fact, the central paradigm of the discipline, inherited from biology, is that “form implies function,” a very geometric notion.
More specifically, bioinformatics uses algorithms, geometry, statistics, data mining, and the like to make connections between data sets gathered as the result of specific biological, chemical, or medical studies. This mathematical structure makes it a great place for students of science, technology, engineering and mathematics [STEM] to look for exciting interdisciplinary careers.
The brief outline below divides bioinformatics mainly in terms of its mathematical components, to help orient a mathematical audience, and give students from the biological sciences an overview of how mathematicians, computer and physical scientists may view the subject. Each mathematical section begins with an introduction to the relevant mathematics, with extensive references, often to online courses freely available at MIT, or Merlot.
Chapter 1 is an overview of the relevant biology. [Principal sources: Deconier, Tavare and Waterman, Lesk, Mount, Campbell and Heyer, Baxevanis and Ouellette.] In this section there is also an introduction to some of the software used in bioinformatics, so that students can gain ready familiarity with sequencing, and visualization, so important for a physical understanding of the key ideas and problems of the subject.
Chapter 2 introduces the algorithms of bioinformatics with main emphasis on algorithms and complexity, along with historical applications, and principal problems. [Principal sources: Jones and Pevzner, Lesk, Baxevanis and Ouellette.]
Chapter 3 discusses probability and statistics in bioinformatics, and introduces R as a statistical package for dealing with biodata. There is also an extended discussion of methods of measuring genetic distance with applications to disease transmission, and applications of statistics microarray analysis, and to the problem of time evolution of biomolecules. [Principal sources: Jones and Pevzner, Deconier, Tavare and Waterman, Gentleman, Carry, Huber, Irizarry and Dudoit, Higgs and Attwood, Nielsen, Ewens and Grant, Stekel, Baxevanis and Ouellette.]
Chapter 4 introduces the problem of small molecular docking to sites on large biomolecules, a problem whose geometry is made more difficult since electrical charges, e.g., Van der Waals forces, etc. are part of the geometry. Students get some important insights into the problem by using small molecular databases, in combination with docking and visualization software like Dock 6, Autodock, and MGL Tools. [Principal sources: online resources at Scripts, Larson; Lengauer, Mannhold, Kubinyi, and Timmerman.]
Chapter 5 gives a brief overview of some research on the feedback mechanisms between nuclear and mitochondrial DNA, and introduces a series of student projects. [Principal resources: Nelson, Alterovits and Ramoni, Cristianini and Hahn.]