Create a new directory (e.g. 'ML') on your desktop and save the treepuzzle iqpnni executables and datasets there.
We will analyze a dataset of HIV1 group M sequences of types A, B, C, D, and G and some outgroup sequences (HIV1 group O and SIV).
To do so, we start tree-puzzle-clicky.exe in the same directory as the dataset - enter the dataset name (hivALN.phy) when prompted for. To measure the phylogenetic content of a dataset switch the type of analysis to likelihood mapping ('b') and do not group the sequences. Switch on rate reterogeneity ('w') and set the number of categories ('c') to 4. Start the mapping analysis typing 'y'.
Examine the result of the datasets in hivALN.phy.puzzle with a text editor and the likelihood mapping diagram dataset-name.eps (or with a postscript viewer like gsview (should be installed).
Is the dataset siutable for phylogenetic analysis?
Start iqpnni.exe in the same directory as
the dataset with a double click. Enter the dataset name (hivALN.phy)
when prompted for.
We switch the the model to incorporate rate heterogeneity ('r').
Start the analysis by typing 'y'.
The results will be found in hivALN.phy.iqpnni,
the ML tree hivALN.phy.iqpnni.treefile.
View the two estimated tree with a tree viewer like FigTree (should be installed).
Can we be sure about the groupings? Do they fit your view of HIV1 evolution?
Start tree-puzzle-clicky.exe with a double-click in the same directory as the dataset. Enter the dataset name (hivALN.phy) when prompted for. Change the option 'w' to switch rate heterogeneity on and change the number of Gamma rate categories to 4 with option ('c'). Start the analysis by typing 'y'.
The results will be found in hivALN.phy.puzzle, the tree in hivALN.phy.eps.
View the two estimated tree with a tree viewer like FigTree (should be installed). The numbers are PUZZLE support values. They behave similar to Bootstrap values, but they are NOT the same.
What is different compared to the ML tree? What do the support values tell you?
An (unrooted) multifurcation with 5 branches can be resolved in 15 different ways. The file hivALN.15trees contains all 15 resolutions to the TREE-PUZZLE tree.
We now want to compute the ML values for each of them and compare these whether we still can find a best tree.
Start tree-puzzle-clicky.exe with a double-click in the same directory as the dataset. Enter the dataset name (hivALN.phy) when prompted for a sequence file and the hivALN.15trees when prompted for a trees-file.
To test trees on a dataset switch the tree search procedure to evaluate user defined trees ('k'). Change option to use neighbor-joining tree ('x') for parameter estimation. Change the option 'w' to switch rate heterogeneity on and change the number of Gamma rate categories to 4 with option ('c'). Start the analysis by typing 'y'.
Examine the results in hivALN.15trees.puzzle with a text editor. At the end is a table of results from three different tests (Kishino-Hasegawa test 1pKH, Shimodaira-Hasegawa test SH, Expected Likelihood Weights ELW) containing all the trees. Those being marked by '-' are concidered significantly worse than the best tree while those with '+' are not.
Is there a clear preference for a single tree? If not think of explainations why this might be the case.