TREE-PUZZLE session (Advanced)

Phylogenetic information practical session for the 14th International Workshop on Virus Evolution and Molecular Epidemiology, Cape Town, 2008

http://www.cibiv.at/~hschmidt/veme/signal

Software and datasets used

apples.phy
bananas.phy
pears.phy
treepuzzle.exe / treepuzzle-clicky.exe prerelease for the workshop of the TREE-PUZZLE software.

Create a new directory on your desktop and save the treepuzzle executables and datasets there.

The following software should already be installed on your computer

DAMBE.
Ghostscript 8.53,
GSview 4.8 from the Ghostscript/Ghostview site.
R (not needed if DAMBE is available.

The exercise:

Which of these two alignment is more useful for phylogenetic analysis (i.e., has phylogenetic information)?
Hint: To do so, we start treepuzzle-clicky.exe with a double-click of your mouse in the same directory as the dataset. Enter the dataset name when prompted for. To measure the phylogenetic content of a dataset switch the type of analysis to likelihood mapping ('b') and do not group the sequences.
If you have knowledge about the preferred model of evolution, one should adjust the model (e.g., rate heterogeneity with 'w').
Start the mapping analysis typing 'y'.
TREE-PUZZLE will write lots of informations and results of the datasets into dataset-name.puzzle with a text editor and the likelihood mapping diagram dataset-name.eps (with a postscript viewer like gsview).
EPS file: Adding the percentages in the corners gives us the number of completely resolved quartets (this percentange should be high), sum of the rectangles gives you the amount of partially resolved quartets (should better be low), the percentage in the center gives the number of quartets which cannot be resolved. The last number is the critical one and should not be high.
PUZZE file: The above values are also printed at the end of the PUZZLE report file. There one also get the number for each taxon, which can help to find outliers and problematic sequences whic should be checked and maybe discarded.
How do the results differ between the three datsets? Are they all suitable for phylogenetic analyses?
Test for saturation with DAMBE?
Start DAMBE. Load the respective dataset into DAMBE. Then plot the saturation plot via the Transition and transversion vs. divergence from the Graphics menu.
How do the results differ between the three datsets? What is strange in the pears dataset?

Resolution
As certainly obvious from the results, bananas.phy has no phylogenetic signals since it is a purely random dataset.
The dataset apples.phy should be fine for analysis. It is a dataset of pol sequencesi from HIV.
The pears.phy dataset should work fine as well. One has to keep in mind however, that it has no transversion which might disturb the estimation of parameters for the evolutionary model. The lack of transversions is due to the fact that the sequences are very closely related.

Phylogenetic information practical session for the 14th International Workshop on Virus Evolution and Molecular Epidemiology, Cape Town, 2008

http://www.cibiv.at/~hschmidt/veme/signal

Software and datasets used

The following software should already be installed on your computer

The exercise:

Resolution