ML and Topology Testing session (Advanced)

ML and Topology Testing practical session for the 13th International Bioinformatics Workshop on Virus Evolution and Molecular Epidemiology, Lisbon, 2007

http://www.cibiv.at/~hschmidt/veme/

Software used

treepuzzle.exe / treepuzzle-clicky.exe prerelease for the workshop of the TREE-PUZZLE software.
IQPNNI 3.2.
Consel.

The exercise:

We will use two datasets in our exercise:

Do the two alignments have phylogenetic contents?
Hint: To do so, we start tree-puzzle in the same directory as the dataset - enter the dataset name when prompted for. To measure the phylogenetic content of a dataset switch the type of analysis to likelihood mapping ('b') and do not group the sequences. Start the mapping analysis typing 'y'.
Examine the result of the datasets in dataset-name.puzzle with a text editor and the likelihood mapping diagram dataset-name.eps (or with a postscript viewer like gsview (should be installed).
Add the percentages in the corners to the the amount on completely resolved quartets (these sums can also be found in the .puzzle file. Summing the in the rectangles gives you the amount of partially resolved quartets. The percentage in the center gives the number of quartets which cannot be resolved. The last number should not be high.
Reconstruct the Maximum Likelihood tree of each dataset.
Hint: Start iqpnni in the same directory as the dataset - enter the dataset name when prompted for. We switch the the model to incorporate rate heterogeneity ('r'). Start the analysis by typing 'y'.
The results will be found in dataset dataset-name.iqpnni, the ML tree dataset-name.iqpnni.treefile.
View the two estimated trees with a tree viewer like FigTree (should be installed).
Are the two trees showing the same branching patterns? If not, do you have an idea which one is the better one?
Evaluate and test the trees with TREE-PUZZLE
Hint: Open a shell by clicking on the consel icon. Start tree-puzzle -wsl dataset-name dataset-treefile for all sets of treefile with the corresponding datafile. So you should do:
- treepuzzle.exe -wsl pol-ali.phy pol-ali.ohy.10trees
- treepuzzle.exe -wsl pol-ali.phy pol-ali.ohy.100trees
- treepuzzle.exe -wsl env-ali.phy env-ali.ohy.10trees
- treepuzzle.exe -wsl env-ali.phy env-ali.ohy.100trees
To test trees on a dataset switch the tree search procedure to evaluate user defined trees ('k'). Change option to use neighbor-joining tree ('x') for parameter estimation. Change the option 'w' to switch rate heterogeneity on and change the number of Gamma rate categories to 4 with option ('c'). Start the analysis by typing 'y'.
Examine the results in treefile-name.puzzle with a text editor. At the end is a table of results from three different tests (Kishino-Hasegawa test, Shimodaira-Hasegawa test, Expected Likelihood Weights) containing all the trees, with those being marked by '-' which are significantly worse than the best tree while those with '+' are not.