****************************************************************************************** TreeSnatcher Plus: A Phylogenetic Tree Capturing Tool tsReadme.txt for TreeSnatcher Plus from June 2010 This software is free of charge and licensed under the GNU public license, except for the parts indicated in the sources where the copyright of the authors does not apply. Please refer to http://www.opensource.org/licenses/gpl-license.html for details. ========================================================================================== Created, designed and programmed by Thomas Laubach Supported by Martin J. Lercher, Heinrich-Heine-Universitaet Duesseldorf, Germany Arndt von Haeseler, Center for Integrative Bioinformatics Vienna (CIBIV) Part of this work was supported by the Wiener Wissenschafts-, Forschungs- und Technologie- fonds awarded to Arndt von Haeseler ========================================================================================== SHORT DESCRIPTION: TreeSnatcher Plus is a GUI-driven Java application for the semi-automatic recognition of multifurcating phylogenetic trees in pixel images. The program accepts an image file as input and analyzes the topology and the metrics of a tree depicted with user assistance. The analysis is carried out in a multiple-stage process using algorithms from the field of image analysis. It yields a Newick expression that represents the tree structure optionally including branch lengths. A solely textual description of the TreeSnatcher Plus user interface would be of no practical value. Instead we provide some PDF tutorials that make appropriate use of screenshots and illustrations which serve as a substitute for the missing manual. ========================================================================================== VERSION HISTORY: Version from June 2010 Fixed: - Windows: The mouse cursor hot spots were corrected - Windows, Linux: "Right click destroys box selection"-issue - Linux: Uses the PDF viewer Okular - All: "Save Image" now obeys the switch "Small Nodes and Branches" - Windows: "An error occurred while writing the image file" appears during erasing pixels from a flooded foreground area Missing: - OCR (Optical character recognition for the species names) This is incredibly difficult as there are almost no restrictions on the position, size and orientation of characters, the choice of fonts etc. - Branch segments determination in round topologies - The Linux and Windows versions lack a progress indicator. - The Linux version does not offer the quadratic curve drawing functionality - The Linux version uses the system mouse cursors - All: The program does not yet visualize different pencil and rubber sizes Known bugs (the most prominent): - The elaborate path finding algorithm that connects the nodes of the tree does not trace paths correctly between nodes whose distance is very short. Instead the simple algorithm can be used. However, the nodes need to be placed more in the center of line intersections. - Rectangular trees can be only accurately processed if the branches are fairly long and the bend(s) in the line more or less right-angled. - The treatment of huge source images needs to be improved, A possibility is to stream them from harddisk - Windows: "An error occurred while writing the image file" appears during erasing pixels from a flooded foreground area ========================================================================================== EXECUTION OPTIONS: Mac OS X: Locate the TreeSnatcher Plus JAR cabinet on your desktop and double-click on it. If the application encounters an out-of-memory error, increase the heap space (parameters "Xms" and "Xmx") in the Shell script TreeSnatcherPlus.sh. You must then always start TreeSnatcher Plus using the script for the memory changes to take effect. Linux: Execute the Shell script TreeSnatcherPlus.sh. If the application encounters an out-of-memory error you should increase the heap space in the Shell script (parameters "Xms" and "Xmx"). Windows: Locate the TreeSnatcher Plus JAR cabinet on your desktop and double-click on it, or double-click the batch script. If the application encounters an out-of-memory error, increase the heap space in the Shell script (parameters "Xms" and "Xmx"). You must then always start TreeSnatcher Plus using the script for the memory changes to take effect. ========================================================================================== SCOPE OF THE PROGRAM: Figures of phylogenetic trees are widely used in publications to illustrate the result of an evolutionary analysis. However, as one cannot effortlessly extract a machine-readable representation, i.e. a Newick expression, of the phylogeny from such images, those are not suited for subsequent reanalysis or easy compilation of tree topologies. Therefore a computer readable representation of a published tree has either to be built completely by hand or by using special applications. Here, TreeSnatcher Plus can be valuable. It identifies the topology of a tree (e.g. a figure from a publication) with user interaction. The application features a sophisticated graphical user interface that is based on the JAVA Swing API. The new version, TreeSnatcher Plus, has been developed from scratch and offers a stable graphical user interface, uses improved methods for image processing and for the analysis of the tree topology. In particular is it no longer necessary to preprocess an image before feeding it into TreeSnatcher Plus. The new application offers all the image preprocessing tools needed. PREREQUISITES: The current version of TreeSnatcher Plus opens image files in the formats PNG, JPG/JPEG or GIF. The PDF format is currently not supported. If you would like to get a phylogeny from an PDF, try to extract the image from the PDF, then load it into TreeSnatcher Plus. If this is not possible, you could try to make some screenshots from the image in the PDF and combine them into a fresh image. The resolution should be adequately high. JAVA uses a huge portion of the available main memory. As TreeSnatcher Plus needs to maintain several copies of the source image in memory, there is a maximum size for the source image that depends on the memory size of your machine. You should increase the heap space prior to running the application. For this, edit and execute the shell script which comes with the Macintosh and Linux versions of TreeSnatcher Plus, or the batch script for the Windows version. As TreeSnatcher Plus offers no online help or a dedicated manual, please work through the various tutorials. All necessary steps are illustrated. Often the same results can also be obtained in a different way. Additionally we have uploaded some work-in-progress images (Snapshots) which you might want to restore from within TreeSnatcher Plus. They illustrate how images need to be processed in the application. The workflow shown in the tutorials might seem cumbersome or unintuitive. However, image preprocessing is necessary: Neither does the computer know the notion of lines, line intersections, line endings, line thickness, nor can it read textual information on its own or grasp an image as a whole. If you know a better method, please feel free to send me a mail. I will be happy to hear from you. WORKFLOW: In contrast to TreeSnatcher, the workflow in TreeSnatcher Plus now includes preprocessing of the image. The order of the global tasks is still mandatory: (1) The program reads the specified RGB image. (2) Preprocess the image The user trims and cuts the image. That is to say, the user can select sub-trees or a subset of taxa from the converted image. If necessary, some image preprocessing tools are used (cf. tutorials). (3) Binarization/Thresholding The user thresholds the image to ensure that the foreground is black and the back- ground is white (4) Skeletonization/Thinning The user thins the part of the image that contains the tree. This is necessary to make it easy for the program to trace the paths between the nodes (5) Flooding the foreground The user marks a foreground location on a branch. The program colors the foreground reachable from this location. The flooded area will be the tree. Everything else is ignored in subsequent steps. (6) Placement of inner nodes (line intersections) and outer nodes (tips) The program suggests locations for inner and outer nodes of the tree. The user can move, remove, and move nodes. (7) Choosing the tree type The program can deal with freeform trees and rectangular trees. In freeform topologies, a branch length is measured from tip to line crossing/from line crossing to line crossing. In rectangular trees, branch lengths are measured from tip to bend, bend to next bendÉ, to line crossing. For this to work, the program tries to divide each branch into straight segments and calculates their slope. The clearer the bend in a branch, the better the result. In particular, this approach does not currently work well for small branches. When using the rectangular tree type, the calculation of branch lengths will only be accurate if the program has identified the bends in the branches correctly. Use the view "Show branch length composition" to check this. The tree type must be chosen prior to step (8). (8) Detection of branches The program tries to find the branches of the tree by tracing the foreground between the node locations. If a branch is wrong, the user sets or erases pixels, moves nodes and then restarts this step. (9) Determining branch lengths The branch length in pixels is measured during step 8. Please keep in mind that the exactness of the branch length measurement depends on the congruence of thinned tree structure, nodes and real tree. There are three types of branch lengths in TreeSnatcher Plus: user assigned length, calculated length and a mixture of both. Calculated length (default): TreeSnatcher Plus first calculates the length of a path between two nodes. Then it determines the branch segments. It adds the length in pixels of the segments that are length relevant. User length: The user can assign a value to a branch. This value remains unaltered during the rest of the session. Mixed lengths: Allows a mixture between calculated lengths and user lengths. This can become necessary if there is for instance a single branch among numerous that has a wrong length. This length can be manually assigned. Mixed lengths can also be used for what-if-scenarios. The user can also mark a line of known length in the image, i.e. a scale bar. If he does, all calculated lengths are recalculated with respect to the new scale. It is also possible to set a calculated length to the value 1.0 or a user specified value. All calculated lengths are recalculated with respect to the new scale. (10) Assigning species names The user clicks on each leaf node in turn in order to type in the corresponding species name in a dialog box. This must be done manually as there is no OCR done so far. (11) Choosing the origin The user either clicks on an inner node to designate it as the origin of the tree, or accepts the default. (12) Constructing the Newick string The program constructs and displays the Newick expression that represents the tree structure. A detailed description of the thinning (skeletonizing) algorithm used in step 4 is given in Zhang, T.Y. and Suen, C.Y. (1984) A fast parallel algorithm for thinning digital patterns. Image Processing and Computer Vision 27, 3 (Mar. 1984), 236-239 In steps 4, 5 and 7 several variants on the flood filling technique are used. Please see Burger, W. and Burge, M.J. (2005) Digitale Bildverarbeitung, Berlin, Heidelberg, Springer Verlag, 196-200. ========================================================================================== INSTALLATION: ------------- Mac OS X - The Mac OS X version of TreeSnatcher Plus comes as a single ZIP-compressed file called "TreeSnatcherPlus.zip". It should run on all Mac OS X versions that can execute the JAVA VM 1.6. - Extract the files "TreeSnatcherPlus_June2010_MacOSX.jar", "tsReadme.txt" (this file) and the remaining files from it and copy them into one directory. - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar icon, or execute the shell script TreeSnatcherPlus.sh. If the application needs more memory, you should adjust the parameters "Xms" and "Xmx" in the shell skript. Linux - The Linux version of TreeSnatcher Plus comes as a single ZIP-compressed file called "TreeSnatcherPlus.zip". It should run on all Linux versions that can execute the JAVA VM 1.6. - Extract the files "TreeSnatcherPlus_June2010_Linux.jar", "tsReadme.txt" (this file) and the remaining files from it and copy them into one directory. - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar icon, or execute the shell script TreeSnatcherPlus.sh. If the application needs more memory, you should adjust the parameters "Xms" and "Xmx" in the shell skript. Windows - The Windows version of TreeSnatcher Plus comes as a single ZIP-compressed file called "TreeSnatcherPlus.zip". It has been tested on Windows XP with the JAVA VM 1.6. - Extract the files "TreeSnatcherPlus_June2010_Windows.jar", "tsReadme.txt" (this file) and the remaining files from it and copy them into one directory. - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar icon, or execute the batch script TreeSnatcherPlus.bat. If the application needs more memory, you should adjust the parameters "Xms" and "Xmx" in the batch skript. If you encounter problems, please ask your local administrator for help. ========================================================================================== REFERENCES: ----------- Thomas Laubach and Arndt von Haeseler (2007) TreeSnatcher: Coding Trees From Images, Bioinformatics 2007 23(24):3384-3385; doi:10.1093/bioinformatics/btm438 If you are using TreeSnatcher Plus, please cite this article. If you like TreeSnatcher Plus, we will be happy to hear from you. ========================================================================================== CREDITS: -------- If you have any further questions that are not answered in this introduction or the tutorials, please drop me an e-Mail (laubach(AT)cs.uni-duesseldorf.de). We welcome any suggestions, criticism, and bug reports. TreeSnatcher Plus is a complex piece of software and is likely to contain bugs. I am indebted to the following people for fruitful discussions, support and suggestions: Martin Lercher, Arndt von Haeseler, Gabriel Gelius-Dietrich, Jochen Kohl, Dominic Mainz, Indra Mainz, Ingo Paulsen, Michael Rosskopf, Stefan Zanger, Steffen Klaere, Sabine Thuss, Na Gao, Wolfgang Kaisers, Guang-Zhong Wang, Janina Mass, Christian Esser, Annika Hoinkes, Christian Cremer, Marc Andre Daxer, Ulrich Wittelsbuerger, Benjamin Braasch, Jan Wolfertz, Rafael Dellen, Claus Jonathan Fritzemeier, Janina Mass, Anna Schlizio, Heiko Schmidt, and others In particular I want to thank Andrew Rambaut for his now classic program TreeThief and for discussing with me the TreeSnatcher Plus approach. ****************************************************************************************** This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ******************************************************************************************