Palaeos Palaeos Cladistics
Systematics Computational cladistics

Cladistics: Computational cladistics



Cladogram of stomiid fish  - diagram from Wikimedia

From Wikipedia. An example of an algorithm-derived cladogram. Cladogram of stomiid fishes, according to Fink WL. 1985. Phylogenetic interrelationships of the stomiid fishes (Teleostei: Stomiiformes). Miscellaneous Publications of the Museum of Zoology, University of Michigan 171:1-127. Note the absense of easily recognisable synapomorphies. The resolved cladograms of this topology have a length of 496 and consistency index of .494, without the seventy-eight generic apomorphies. With generic apomorphies included, the length is 574, the consistency index is .563. A-D show alternative resolved cladograms for Malacosteus-Pachystomias-Aristostomias-Photostomias group. For character conventions, see Fink (1985). Diagram and text by Filip em. The letters A to Y represent nodes, the names at the right of the diagram genera.

After the incorporation of paleontology, the next big revolution in cladistics was statistical computation, methods also used in molecular phylogeny. With the rise of cheap and easily available high powered computing, computational phylogeny and algorithm-based cladistics has replaced the original one-shot cladogram of earlier Phylogenetic Systematics. This meant a change in emphasis from identifying small numbers of easily recognisable and studied synapomorphies (without cheap and powerful computing it was not practical to do otherwise), to statistical analyses of huge data matrixes and supermatrixes, featuring hundreds of character states and millions of possible trees. Cladistics then focuses on evaluating and selecting the most likely or plausible phylogenetic hypotheses. This is because whereas Phylogenetic Systematics would only result in a single, parsimonious cladogram, statistical cladistics calculates millions. The big problem here is the missing data in fossil forms, which are often incomplete and fragmentary, and which therefore act as wildcards. Some taxa can be particularily unstable, jumping around to different positions in different trees. Nevertheless, fossil taxa still provide useful informnation and an additional phylogenetic signal that would not be present if only extant taxa were used.

There are several algorithms available to identify the "best" cladogram. Most algorithms use a metric (a mathematical function which defines a distance between elements of a set) to measure how consistent a candidate cladogram is with the data. Most cladogram algorithms use the mathematical techniques of optimization (choosing the best element from some set of available alternatives) and minimization. In general, cladogram generation algorithms must be implemented as computer programs, although some algorithms can be performed manually when the data sets are trivial (for example, just a few species and a couple of characteristics). Algorithms include least squares (minimising the sum of the squares of the errors made in solving every single equation), neighbor-joining, parsimony, maximum likelihood, and Bayesian inference. (Wikipedia)

Although computational cladistics uses all of the same basic principles of phylogenetic systematics, it results in very different results. Often stratigraphically early monophyletic groups, such as Gauthier's Ceratosauria, which in terms of obvious synapomorphies appear to be simply a large clade of primitive theropod dinosaurs, become paraphyletic step-wise evolutionary grades, thus eliminating excessive ghost lineages and generating cladograms closer to the actual stratigraphic record. Other differences might be previously monophyletic clades now becoming diphyletic, as with protostegid sea turtles that in terms of easily recognised synapomorphies are very close to leatherback sea turtles, but are shown by statistical analysis to be a totally distinct and much more primitive group that are simply convergent with modern sea turtles. These sorts of results tend to be more compatible with both molecular phylogeny and stratigraphy in showing that what were previously considered to be homologies (shared characteristics inhereted from a common ancestor) are actually astonishing instances of homoplasy (convergent evolution). Contrary to the insights of the early cladists, who emphasised parsimonony-based approaches, it seems that homoplasy is rampant throughout nature, making attempts at reconstructing phylogeny difficult at best.

Perhaps because of its far larger data set and its more empirical and quantitative approach, computational or algorithm-based cladistics is considered more reliable than hand-coded Phylogenetic Systematics. In all other respects they are still very similar, in that they both emphasise distinguishing synapomorphies or homologies from plesiomorphies, and from homoplasies (convergences) in order to identify monophyletic clades

In the late 2000s and early 2010s, algorithm-based cladistics has become assimilated into molecular phylogeny, as the two use exactly the same statsitical algorithms to create the most optimal cladograms. When integrated, statistical cladistics and molecular become the new science of phylogentics. A problem here is that these two methodologies often give strongly incongruent trees. Nevertheless this a victory for the molecules, as in the great majority of published papers wherever there is a clash between molecules and morphology, morphology-based cladistics plays second fiddle to molecular sequencing. The challenge of phylogenetics is to avoid this bias and balance the two methodologies (and others as well). MAK210324







images not loading? | error messages? | broken links? | suggestions? | criticism?

contact us


original page MAK111014, edited RFVS111203, this page MAK130320. Material by MAK is Creative Commons License.