Molecular phylogenetics

Molecular phylogenetics (/məˈlɛkjᵿlər ˌfaɪloʊdʒəˈnɛtɪks, mɒ-, moʊ-/^[1]^[2]) is the branch of phylogeny that analyses hereditary molecular differences, mainly in DNA sequences, to gain information on an organism's evolutionary relationships. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.

History of molecular phylogenetics

Further information: History of molecular evolution

The theoretical frameworks for molecular systematics were laid in the 1960s in the works of Emile Zuckerkandl, Emanuel Margoliash, Linus Pauling, and Walter M. Fitch.^[3] Applications of molecular systematics were pioneered by Charles G. Sibley (birds), Herbert C. Dessauer (herpetology), and Morris Goodman (primates), followed by Allan C. Wilson, Robert K. Selander, and John C. Avise (who studied various groups). Work with protein electrophoresis began around 1956. Although the results were not quantitative and did not initially improve on morphological classification, they provided tantalizing hints that long-held notions of the classifications of birds, for example, needed substantial revision. In the period of 1974–1986, DNA-DNA hybridization was the dominant technique.^[4]

Techniques and applications

Every living organism contains DNA, RNA, and proteins. In general, closely related organisms have a high degree of agreement in the molecular structure of these substances, while the molecules of organisms distantly related usually show a pattern of dissimilarity. Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations over time, and assuming a constant rate of mutation, provides a molecular clock for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable evolution of various organisms. With the invention of Sanger sequencing in 1977 it became possible to isolate and identify these molecular structures.^[5]^[6]

The most common approach is the comparison of homologous sequences for genes using sequence alignment techniques to identify similarity. Another application of molecular phylogeny is in DNA barcoding, wherein the species of an individual organism is identified using small sections of mitochondrial DNA or chloroplast DNA. Another application of the techniques that make this possible can be seen in the very limited field of human genetics, such as the ever-more-popular use of genetic testing to determine a child's paternity, as well as the emergence of a new branch of criminal forensics focused on evidence known as genetic fingerprinting.

A comprehensive step-by-step protocol on constructing phylogenetic tree, including DNA/Amino Acid contiguous sequence assembly, multiple sequence alignment, model-test (testing best-fitting substitution models) and phylogeny reconstruction using Maximum Likelihood and Bayesian Inference, is available at Nature Protocol^[7]

Theoretical background

Early attempts at molecular systematics were also termed as chemotaxonomy and made use of proteins, enzymes, carbohydrates, and other molecules that were separated and characterized using techniques such as chromatography. These have been replaced in recent times largely by DNA sequencing, which produces the exact sequences of nucleotides or bases in either DNA or RNA segments extracted using different techniques. In general, these are considered superior for evolutionary studies, since the actions of evolution are ultimately reflected in the genetic sequences. At present, it is still a long and expensive process to sequence the entire DNA of an organism (its genome). However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its haplotype. In principle, since there are four base types, with 1000 base pairs, we could have 4¹⁰⁰⁰ distinct haplotypes. However, for organisms within a particular species or in a group of related species, it has been found empirically that only a minority of sites show any variation at all and most of the variations that are found are correlated, so that the number of distinct haplotypes that are found is relatively small.

In a molecular systematic analysis, the haplotypes are determined for a defined area of genetic material; a substantial sample of individuals of the target species or other taxon is used, however many current studies are based on single individuals. Haplotypes of individuals of closely related, but different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: These are referred to as an out group. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: This is referred to as the number of substitutions (other kinds of differences between haplotypes can also occur, for example the insertion of a section of nucleic acid in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a percentage divergence, by dividing the number of substitutions by the number of base pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced.

An older and superseded approach was to determine the divergences between the genotypes of individuals by DNA-DNA hybridisation. The advantage claimed for using hybridisation rather than gene sequencing was that it was based on the entire genotype, rather than on particular sections of DNA. Modern sequence comparison techniques overcome this objection by the use of multiple sequences.

Once the divergences between all pairs of samples have been determined, the resulting triangular matrix of differences is submitted to some form of statistical cluster analysis, and the resulting dendrogram is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group, or not. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a clade. Statistical techniques such as bootstrapping and jackknifing help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.

Limitations of molecular systematics

Molecular systematics is an essentially cladistic approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be monophyletic.

The recent discovery of extensive horizontal gene transfer among organisms provides a significant complication to molecular systematics, indicating that different genes within the same organism can have different phylogenies.

In addition, molecular phylogenies are sensitive to the assumptions and models that go into making them. They face problems like long-branch attraction, saturation, and taxon sampling problems: This means that strikingly different results can be obtained by applying different models to the same dataset.^[8]

Notes and references

↑ Jones, Daniel (2003) [1917], Peter Roach, James Hartmann and Jane Setter, eds., English Pronouncing Dictionary, Cambridge: Cambridge University Press, ISBN 3-12-539683-2
↑ "Phylogenetic". Merriam-Webster Dictionary.
↑ Suárez-Díaz, Edna & Anaya-Muñoz, Victor H. (2008). "History, objectivity, and the construction of molecular phylogenies". Stud. Hist. Phil. Biol. & Biomed. Sci. 39 (4): 451–468. doi:10.1016/j.shpsc.2008.09.002. PMID 19026976.
↑ Ahlquist, Jon E. (1999). "Charles G. Sibley: A commentary on 30 years of collaboration". The Auk. 116 (3): 856–860. doi:10.2307/4089352.
↑ Sanger F, Coulson AR (May 1975). "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase". J. Mol. Biol. 94 (3): 441–8. doi:10.1016/0022-2836(75)90213-2. PMID 1100841.
↑ Sanger F, Nicklen S, Coulson AR (December 1977). "DNA sequencing with chain-terminating inhibitors". Proc. Natl. Acad. Sci. U.S.A. 74 (12): 5463–7. Bibcode:1977PNAS...74.5463S. doi:10.1073/pnas.74.12.5463. PMC 431765. PMID 271968.
↑ Bast, F. 2013. Sequence Similarity Search, Multiple Sequence Alignment, Model Selection, Distance Matrix and Phylogeny Reconstruction. Nature Protocol Exchange. doi: 10.1038/protex.2013.065
↑ Philippe, H.; Brinkmann, H.; Lavrov, D. V.; Littlewood, D. T. J.; Manuel, M.; Wörheide, G.; Baurain, D. (2011). Penny, David, ed. "Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough". PLoS Biology. 9 (3): e1000602. doi:10.1371/journal.pbio.1000602. PMC 3057953. PMID 21423652.

External links

NCBI – Systematics and Molecular Phylogenetics
The promise of a DNA taxonomy (Mark L. Blaxter)
Molecular phylogenetics from Encyclopedia Britannica.

Topics in phylogenetics

Relevant fields	Computational phylogenetics Molecular phylogenetics Cladistics Evolutionary taxonomy

Basic concepts	Phylogenetic tree Phylogenetic network Long branch attraction Clade vs Grade Ghost lineage Ghost population

Inference methods	Maximum parsimony Probabilistic methods Maximum likelihood Bayesian inference Distance-matrix methods Neighbor-joining UPGMA Least squares Three-taxon analysis

Current topics	PhyloCode DNA barcoding Molecular phylogenetics Phylogenetic comparative methods Phylogenetic network Phylogenetic niche conservatism Phylogenetics software Phylogenomics Phylogeography

Group traits	Symplesiomorphy Apomorphy Synapomorphy Autapomorphy

Group types	Monophyly Paraphyly Polyphyly

List of evolutionary biology topics Evolutionary biology

Bioinformatics

Databases	Sequence databases: GenBank, European Nucleotide Archive and DNA Data Bank of Japan Secondary databases: UniProt, database of protein sequences grouping together Swiss-Prot, TrEMBL and Protein Information Resource Other databases: Protein Data Bank, Ensembl and InterPro Specialised genomic databases: BOLD, Saccharomyces Genome Database, FlyBase, VectorBase, WormBase, PHI-base, Arabidopsis Information Resource and Zebrafish Information Network

Software	BLAST Bowtie Clustal HMMER MUSCLE SAMtools TopHat

Other	Server: ExPASy Ontology: Gene Ontology

Institutions	European Bioinformatics Institute US National Center for Biotechnology Information Swiss Institute of Bioinformatics Japanese Institute of Genetics Broad Institute Wellcome Trust Sanger Institute

Meetings	Intelligent Systems for Molecular Biology (ISMB) Research in Computational Molecular Biology (RECOMB) European Conference on Computational Biology (ECCB) Pacific Symposium on Biocomputing (PSB) ISCB Africa ASBCB Conference on Bioinformatics Basel Computational Biology Conference‎ ([BC²])

Computational biology List of biological databases Sequencing Sequence database Sequence alignment Molecular phylogenetics

This article is issued from Wikipedia - version of the 10/16/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.