Research

The CIBIV wants to understand the processes that have shaped the genomes of contemporary species. To this end we apply methods from statistics, computer sciences, mathematics and computational statistics to develop models that mimic the process of evolution. These methods are further investigated in close collaboration with "wet" biologists to address real biological questions.

Currently we are working (in collaboration with various colleagues) on the following aspects of molecular evolution:

Alignments
Statistics of sequence alignment (i.e. mcmcalgn). Recently we have extended this approach to reconstruct an alignment and a phylogenetic tree simultaneously.

Sequence evolution
To understand sequence evolution it is necessary to model the substitution process. We are working on models sequence that allow dependencies among sequence sites (Markov fields seem to be an appropriate tool). We are developing test statistics to select the "best" model, to detect groups of sequence that evolve differently form the rest of a gene family, say. We have developed a test to detect change points (branches where the substitution model changes) in a phylogenetic tree. Currently we are working on methods to detect the dependency structure among sequence positions in an alignment.

Gene trees
We develop efficient heuristic algorithms to reconstruct trees based on sequence data (i.e. TREE-PUZZLE). To this end we have developed parallel TREE-PUZZLE program. Moreover, we are currently developing a variant of TREE-PUZZLE, which computes (maximum) likelihood trees for up to 1,000 sequences in reasonable time. We are also working on super tree methods to merge different gene trees to form one species tree. Quartet based tree reconstruction method appear as a versatile tool to study super trees from a new perspective.

Population genetics
Gene trees appear in a natural context also in populations, here, however, the gene tree in a population is a random variable if a sample of sequences is drawn from the population. We are interested in the development and application of coalescence based methods to infer the demographic history of populations. In the future we plan to work on coalescence processes with complex interactions patterns. In this context we have constructed the so called hvrbase, where currently most of the hypervariable regions from the mitochondrial genome from primates are collected in a multiple sequence alignment. This user friendly database is currently extended to store other genomic regions.

Complex pattern of evolution
To reconstruct the evolutionary history it is necessary to take more complex events like lateral gene transfer (between species), gene duplication, and gene loss into account. A combination of these events may disturb the relation between species trees and gene trees. Recently, we have developed a maximum likelihood based method to estimate the amount of gene flow among prokaryotes by analyzing the COG database. This full genome analysis poses a collection of new computational problems as well as modeling problems. Our "Jukes Cantor" type of modeling gene transfer needs refinements. Moreover, we have to take into account duplication and losses of genes. This will be done in the next future.

Species tree
The topics outlined above will eventually be employed to reconstruct one gigantic species tree utilizing all the sequence data available for the different species. Models of sequence evolution are necessary to detect differently evolving regions in complete genomes. Tree reconstruction methods for a large number of sequences allow the reconstruction of gene trees with several hundred sequences, and finally the patchiness of the available sequence data for different species makes it necessary to apply super tree methods. A better understanding of complex evolutionary patterns will also reveal instances where the gene trees are different from the species tree. Once this is well understood it seems reasonable to construct a sequenced based tree of life.