TREE-PUZZLE practical session for the 12th International Workshop on Virus Evolution and Molecular Epidemiology, Athens, 2006

    http://www.cibiv.at/~hschmidt/veme/


    Slides and Software

    • Slides of the theoretical session.
    • The software can be downloaded from here.

    The exercise:


    1. Which of these two alignment is more useful for phylogenetic analysis?
      1. test46-1701.phy
      2. test64-384.phy

      Hint: To do so, we start tree-puzzle in the same directory as the dataset - enter the dataset name when prompted for. To measure the phylogenetic content of a dataset switch the type of analysis to likelihood mapping ('b') and do not group the sequences. Start the mapping analysis typing 'y'.

      Examine the result of the datasets in dataset-name.puzzle with a text editor and the likelihood mapping diagram dataset-name.eps (or with a postscript viewer like gsview.

      Add the percentages in the corners to the the amount on completely resolved quartets. Summing the in the rectangles gives you the amount of partially resolved quartets. The percentage in the center gives the number of quartets which cannot be resolved. The last number should not be high.

    2. Try to find out whether sequence 1 (SouthCarolina1918) clusters with human (h*), swine (s*), or avian (a*) viruses.
      flu-a-1000.phy

      Hint: We will examine this partial dataset on the Spanish Flu (South Carolina, 1918) from Worobey et al. (2002; DOI: 10.1126/science.296.5566.211a) using a 4-cluster likelihood mapping. Start tree-puzzle in the same directory as the dataset - enter the dataset name when prompted for. We switch the type of analysis to likelihood mapping ('b') and group sequences into 4 clusters ('g'):

      • avian/bird virus sequences (starting with 'a') to cluster A
      • swine/pig virus sequences (starting with 's') to cluster B
      • human virus sequences (starting with 'h') to cluster C
      • Spanish Flu virus (SouthCarolina1918) to cluster D
      • (sequences can be excluded from analysis by assigning them to X - not needed here!)
      Start the analysis by typing 'y'.
      The results will be found in flu-a-1000.phy.puzzle the likelihood mapping plot in flu-a-1000.phy.eps.

    3. Evaluate and test the trees in flu-a-1000.trees (with the alignment file flu-a-1000.phy). Is there a clearly supported tree?

      Hint: We start tree-puzzle in the same directory as the dataset - enter the dataset name when prompted for, and the treefile as well later. To test trees on a dataset switch the tree search procedure to evaluate user defined trees ('k'). Change option to use neighbor-joining tree ('x') for parameter estimation. Start the analysis by typing 'y'.

      Examine the results in flu-a-1000.phy.puzzle with a text editor. At the end is a table of results from three different tests (Kishino-Hasegawa test, Shimodaira-Hasegawa test, Expected Likelihood Weights) containing all the trees, with those being marked by '-' which are significantly worse than the best tree while those with '+' are not.

      There is a good overview and discussion on testing trees by Goldman et al. (2000; DOI: 10.1080/106351500750049752 and there are free copies found by google).

    4. Run an tree reconstruction of pol.phy. Identify the the outlier, remove it from the dataset, and rerun the tree reconstruction.

      Hint:

      • Start tree-puzzle in the same directory as the dataset - enter the dataset name when prompted for. (You might change parameters like the model of evolution.) Examine the output in flu-a-1000.phy.puzzle (or outfile) with a text editor. Find out especially in the QUARTET STATISTICS part about the most unresolved species and remove it from it the dataset.
      • Start tree-puzzle in the same directory as the dataset - enter the dataset name when prompted for... and re-do the analysis.