******************************************************************************** ******************************************************************************** **** **** **** TreeSnatcher: A Phylogenetic Tree Capturing Tool **** **** **** **** User Manual **** **** Version 1.0 (August 17th 2007) **** **** **** ******************************************************************************** ******************************************************************************** Copyright (C) 2007 by Thomas Laubach and Arndt von Haeseler This program is free software; you can redistribute it at will. ================================================================================ Main Contributors Thomas Laubach Bioinformatics Institute, Heinrich-Heine-University Duesseldorf, Germany thomas.laubach(AT)uni-duesseldorf.de Arndt von Haeseler Center for Integrative Bioinformatics Vienna Max F. Perutz Laboratories, Austria arndt.von.haeseler(AT)mfpl.ac.at associate member of Bioinformatics Institute, Heinrich-Heine-University Duesseldorf, Germany ================================================================================ SHORT DESCRIPTION: ------------------ TreeSnatcher is a GUI-driven Java application for the semi-automatic recognition of multifurcating phylogenetic trees in pixel images. The program accepts an image file as input and analyzes the topology and the metrics of a tree depicted with user assistance. The analysis is carried out in a multiple-stage process using basic algorithms from the field of image analysis. It yields a Newick expression that represents the tree structure optionally including branch lengths. A solely textual description of the TreeSnatcher user interface would be of no practical value. Instead we provide for your convenience a PDF quick-start tutorial that makes appropriate use of screenshots. ================================================================================ VERSION HISTORY: ---------------- Preliminary version 3.5 (July 15th 2007) Implemented: - Default: The WorkView's and the TemplateView's viewports are synchronized - The viewport is restored in every successive program step - Added: Internal procedures that detect horizontal and vertical portions of an edge (not yet completed) To be implemented: - Store images on harddisk that are currently stored in memory, this should lead to normal speed of image manipulations (high priority); - Drag and drop of images (low priority) - Copy and paste of Newick expression (medium priority) - Treatment of horizontal and vertical branch portions (high priority) Known bugs: - After manipulating a branch it is treated as desired but displayed in its old shape - Step 6 is set inactive until further notice - Template window gets a different size when synchronizing both windows - Tool bar flickering - Certain images get displayed wrong (due to alpha channel?) - Some visual bugs - The magnifier gets initialized with the old image after starting a new recognition -------------------------------------------------------------------------------- Preliminary version 3 (July 14th 2007) Implemented: - Treatment of the root: There is no need any longer to determine a root location; Default behavior: The first leaf detected in the tree becomes the root location, it is treated as 'outgroup' - User can determine whether he either wants to treat an outer node as 'outgroup' => leaf and its corresponding branch will be kept, or as root => leaf and its corresponding branch will be erased - When marking an outer node as root its assigned species name is erased - User can determine an inner node as starting location from where the Newick string gets constructed; this can yield multifurcating trees Known bugs: - Step 6 is set inactive until further notice; if anyone needs manual branch lengths it will be reactivated -------------------------------------------------------------------------------- Preliminary version 2 (July 13th 2007) Implemented: - Treatment of branch lengths: length calculation 'from node to node' THIS MEANS THAT TREESNATCHER DOES NOT CURRENTLY DIFFERENTIATE BETWEEN HORIZONTAL AND VERTICAL PARTS OF A BRANCH; THIS IS THE NEXT THING TO DO - User can decide if he wants a Newick expression with (default) or without branch lengths Default behavior: All branch lengths are relative to the longest branch - Metrics tool: User can scale the tree based on a line of known length in the image; please keep in mind that currently only branch length detection 'from node to node' is carried out. Fixed: - User can modify pixels in the node identification step and re-issue the nodes detection (5) - Another bug was corrected: AUTO button malfunction in step (4) Known bugs: - Pixel manipulations get very slow when using huge images (will be fixed soon) - Template window gets a different size when synchronizing both windows - Tool bar flickering - Certain images get displayed wrong (due to alpha channel?) -------------------------------------------------------------------------------- Preliminary version (End of June 2007) Implemented: - Source image conversion to grayscale - Binarization of the the grayscale image - Skeletonization of the binary image - Automatical detection of nodes and edges - Pencil and rubber operations on the image - User can position the root on an outer node - Newick string features edge lengths (currently deactivated) - Newick string can be saved to a file Soon to be implemented: - Drag and drop of Newick expression straight from the window - Variable root position - Treatment of edge lengths - Binarization of images with non-uniform background - Newick string with branch lengths - Program detects horizontal and vertical branches - Movable magnifier Known bugs: - Pixel manipulations get very slow when using huge images (will be fixed soon) - Template window gets a different size when synchronizing both windows - Tool bar flickering - Certain images get displayed wrong (due to alpha channel?) - Pixel alterations performed in the node identification step are ignored ================================================================================ EXECUTION OPTIONS: ------------------ Mac OS X: Locate the TreeSnatcher application icon on your desktop and double-click on it. Windows: Locate the TreeSnatcher.jar on your desktop and double-click on it. Linux: Enter the Shell, change the directory to the one in which TreeSnatcher.jar and the accompanying files are. Type java -jar TreeSnatcher.jar, press Enter. (Does anyone now how to execute JAR-files with a mouse-click?) ================================================================================ SCOPE OF THE PROGRAM: --------------------- Figures of phylogenetic trees are widely used in publications to illustrate the result of an evolutionary analysis. However, as one cannot effortlessly extract a machine-readable representation, i.e. a Newick expression, of the phylogeny from such images, those are not suited for subsequent reanalysis or easy compilation of tree topologies. Therefore a computer readable representation of a published tree has either to be built completely by hand or by using special applications. Here, TreeSnatcher can be valuable. It identifies the topology of a tree (e.g. a figure from a publication) almost automatically with only minor user interaction. The application features a sophisticated graphical user interface that is based on the JAVA Swing API. PREREQUISITES: -------------- The current version of TreeSnatcher opens tree-image files in the formats PNG, JPG/JPEG or GIF. The PDF format is currently not supported. The user has to provide an image that a) shows a tree and b) complies with the following requirements: - The image has a light and homogeneous background that is clearly separated from a dark foreground. - The tree along with species names and branch captions constitute the foreground. Lettering and branches must not overlap. - The branches consist of evenly thin, plain, not necessarily straight continuous lines. - The inner nodes are branching points between lines and have no circles, rectangles etc. inscribed. - The outer nodes do not feature leaves: the edges end at a well defined location. If the image does not meet those requirements the tree topology will not be identified correctly. Please have a look at the tutorial which explains how TreeSnatcher detects nodes in the tree specified. The approach is straightforward but might be difficult to understand for those not familiar with image analysis. WORKFLOW: --------- Working with TreeSnatcher takes place along a mandatory succession of program stages. The program checks whether the user has performed the formal task(s) specified at any given stage and permits advancing to the next stage. On the other hand the user controls all image manipulations and recognition tasks performed by the program. The entire workflow is as follows: (1) The program converts the specified RGB image into a grayscale image. As color information is not needed for the tree recognition, all RGB values get expressed as shades of gray. The weights used for color conversion were taken from International Telecommunications Union: ITU-R Recommendation BT.709-3 (1998) Basic Parameter Values for the HDTV Standard for the Studio and for International Programme Exchange. (2) The user trims and cuts the image at will. That is to say, the user can select sub-trees or a subset of taxa from the converted image. (3) The user corrects the image by drawing and erasing pixels. If a line of known length is present in the image, i.e. a scale, the user can mark it and specify a unit length in a dialog box to scale the tree. (4) The program first separates all foreground shades from the background shades in order to extract the tree from the background. Then it thins out the foreground to prepare the detection of tree nodes and edges. The user marks a location on an edge and instructs the program to color the foreground reachable from this location. (5) The program carries out the detection of nodes and leaves based on the previously colored part of the image. The user rejects or adds tree nodes by drawing and erasing pixels. Then TreeSnatcher repeats recognition step 5 on demand. (6) The program displays the inner nodes and edges of the tree. The user can provide individual edge lengths in a dialog box by clicking on the nodes that are uniquely assigned to an edge. This overrides a previous specification of a unit length for the whole tree. Note: Please do not make use of individual edge lengths right now as the treatment of edge lengths is deactivated in the current version. (7) The user clicks on each leaf node in turn in order to type in the corresponding species name in a dialog box. (8) The user either clicks on an outer node to designate it as the root of the tree. (9) The program constructs and displays the Newick expression that represents the tree structure. A detailed description of the thinning (skeletonizing) algorithm used in step 4 is given in Zhang, T.Y. and Suen, C.Y. (1984) A fast parallel algorithm for thinning digital patterns. Image Processing and Computer Vision 27, 3 (Mar. 1984), 236-239 In steps 4 and 5 several variants on the flood filling technique are used. Please see Burger, W. and Burge, M.J. (2005) Digitale Bildverarbeitung, Berlin, Heidelberg, Springer Verlag, 196-200. ================================================================================== INSTALLATION: ------------- Mac OS X - The Mac OS X 10.4 (Tiger) version of TreeSnatcher comes as a single ZIP-compressed file called "TreeSnatcher_Mac.zip". - Extract the files "TreeSnatcher.app" and "treesnatcher-manual-x.x.txt" (this file) from it. This can easily be done by double-clicking its icon. - Start TreeSnatcher by double-clicking the TreeSnatcher.app icon. Windows - The Windows version of TreeSnatcher comes as a ZIP-compressed archive called "TreeSnatcher_Win32.zip". - Extract the files "TreeSnatcher.jar", "treesnatcher-manual-x.x.txt" and the remaining files from it and put all together into one directory. - Start TreeSnatcher by either double-clicking the TreeSnatcher.jar icon, or by entering "java -jar TreeSnatcher.jar" within the TreeSnatcher directory at the DOS shell. Linux - The Linux version of TreeSnatcher comes as a ZIP-compressed archive called "TreeSnatcher_Linux.zip". - Extract the files TreeSnatcher.jar", "treesnatcher-manual-x.x.txt" and the remaining files from it and put all together into one directory. - To execute TreeSnatcher, enter the Shell, change the directory to that in which you copied TreeSnatcher.jar and all accompanying files. Type "java -jar -Xmx384m TreeSnatcher.jar", press Enter. If you encounter out-of-memory errors while executing TreeSnatcher, increase the amount of heap memory, use for instance -Xmx512m. If you encounter problems, please ask your local administrator for help. =================================================================================== IMPORTANT HINTS: ---------------- There are cases in which TreeSnatcher might output a wrong Newick expression: The skeletonization algorithm used in TreeSnatcher to thin out the foreground is currently not entirely intelligent. This means that the program might erroneously claim nodes in the tree even if the user has provided an image that is perfectly suitable. If these wrong nodes are not deleted, the Newick expression will be wrong. This is an incident that will be solved in future version of TreeSnatcher. Please carefully read the section 'Where TreeSnatcher finds nodes' in the accompany- ing PDF. These are locations within an image that need your special attention: - 'crotches' (those are especially prone to skeletonization errors) - places where a branch and text overlap (here TreeSnatcher will find "wrong" branches for sure) - branch ends ('leaves', thick line endings are also prone to skeletonization errors) Remember that up to now TreeSnatcher measures branch lengths 'from node to node'. In other words: TreeSnatcher will not recognize horizontal and vertical portions of a branch. Please understand that this program is a work-in-progress. We need your feedback and welcome bug reports and your suggestions. REFERENCES: ----------- Thomas Laubach and Arndt von Haeseler (2007) TreeSnatcher: Coding Trees From Images, Bioinformatics, to be published If you are using TreeSnatcher, please cite this article. =================================================================================== CREDITS: -------- If you have any further questions that are not answered in the tutorial please drop us an e-Mail. We welcome any suggestions, criticism and bug reports. TreeSnatcher is a complex piece of software and is likely to contain bugs. We are indebted to the following people for fruitful discussions and suggestions: Deniz Dalli, Roland Fleissner, Gabriel Gelius-Dietrich, Jochen Kohl, Dominic Mainz, Ingo Paulsen, Michael Rosskopf, Stefan Zanger and others, Bioinformatics Institute, Heinrich-Heine-University Duesseldorf, Germany Steffen Klaere, Center for Integrative Bioinformatics Vienna Max F. Perutz Laboratories, Austria ******************************************************************************** * This program is distributed in the hope that it will be useful, but * * WITHOUT ANY WARRANTY; without even the implied warranty of * * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. * ********************************************************************************