********************************************************************************
                                                                     
                TreeSnatcher Plus: A Phylogenetic Tree Capturing Tool

                   tsReadme.txt for TreeSnatcher Plus from June 2010

******************************************************************************************


Created, designed and programmed by Thomas Laubach

Supported by
Martin J. Lercher, Heinrich-Heine-Universitaet Duesseldorf, Germany
Arndt von Haeseler, Center for Integrative Bioinformatics Vienna (CIBIV)

This software is free of charge and licensed under the GNU public license, except
for the parts indicated in the sources where the copyright of the authors does
not apply. Please refer to http://www.opensource.org/licenses/gpl-license.html 
for details.

==========================================================================================

SHORT DESCRIPTION:

TreeSnatcher Plus is a GUI-driven Java application for the semi-automatic recognition
of multifurcating phylogenetic trees in pixel images. The program accepts an
image file as input and analyzes the topology and the metrics of a tree depicted
with user assistance. The analysis is carried out in a multiple-stage process
using algorithms from the field of image analysis. It yields a Newick expression
that represents the tree structure optionally including branch lengths.

A solely textual description of the TreeSnatcher Plus user interface would be of
no practical value. Instead we provide some PDF tutorials that make appropriate
use of screenshots and illustrations which serve as a substitute for the missing
manual.

==========================================================================================


VERSION HISTORY: 

Version from June 2010

Fixed:
- Windows: The mouse cursor hot spots were corrected
- Windows, Linux: "Right click destroys box selection"-issue
- Linux: Uses the PDF viewer Okular
- All: "Save Image" now obeys the switch "Small Nodes and Branches"

Missing:
- OCR (Optical character recognition for the species names)
  This is incredibly difficult as there are almost no restrictions on the position,
  size and orientation of characters, the choice of fonts etc.
- Branch segments determination in round topologies 
- The Linux and Windows versions lack a progress indicator.
- The Linux version does not offer the quadratic curve drawing functionality
- The Linux version uses the system mouse cursors
- All: The program does not yet visualize different pencil and rubber sizes

Known bugs (the most prominent):
- The elaborate path finding algorithm that connects the nodes of the tree does not
  trace paths correctly between nodes whose distance is very short.
  Instead the simple algorithm can be used. However, the nodes need to be placed
  more in the center of line intersections.
- Rectangular trees can be only accurately processed if the branches are fairly
  long and the bend(s) in the line more or less right-angled.
- The treatment of huge source images needs to be improved, A possibility is to stream
  them from harddisk
- Windows: "An error occurred while writing the image file" appears during erasing pixels
           from a flooded foreground area

==========================================================================================


EXECUTION OPTIONS:

Mac OS X:
Locate the TreeSnatcher Plus JAR cabinet on your desktop and double-click
on it. If the application encounters an out-of-memory error, increase
the heap space (parameters "Xms" and "Xmx") in the Shell script
TreeSnatcherPlus.sh. You must then always start TreeSnatcher Plus using
the script for the memory changes to take effect.

Linux:
Execute the Shell script TreeSnatcherPlus.sh.
If the application encounters an out-of-memory error you should increase
the heap space in the Shell script (parameters "Xms" and "Xmx").

Windows:
Locate the TreeSnatcher Plus JAR cabinet on your desktop and double-click
on it, or double-click the batch script. If the application encounters an
out-of-memory error, increase the heap space in the Shell script 
(parameters "Xms" and "Xmx"). You must then always start TreeSnatcher Plus
using the script for the memory changes to take effect.

==========================================================================================



SCOPE OF THE PROGRAM:

Figures of phylogenetic trees are widely used in publications to illustrate
the result of an evolutionary analysis. However, as one cannot effortlessly
extract a machine-readable representation, i.e. a Newick expression, of the
phylogeny from such images, those are not suited for subsequent reanalysis
or easy compilation of tree topologies. Therefore a computer readable
representation of a published tree has either to be built completely by hand
or by using special applications.

Here, TreeSnatcher Plus can be valuable. It identifies the topology of a tree
(e.g. a figure from a publication) with user interaction. The application 
features a sophisticated graphical user interface that is based on the JAVA
Swing API.

The new version, TreeSnatcher Plus, has been developed from scratch and offers
a stable graphical user interface, uses improved methods for image processing
and for the analysis of the tree topology. In particular is it no longer 
necessary to preprocess an image before feeding it into TreeSnatcher Plus.
The new application offers all the image preprocessing tools needed.  


PREREQUISITES:

The current version of TreeSnatcher Plus opens image files in the formats
PNG, JPG/JPEG or GIF. The PDF format is currently not supported.
If you would like to get a phylogeny from an PDF, try to extract the image
from the PDF, then load it into TreeSnatcher Plus.
If this is not possible, you could try to make some screenshots from the
image in the PDF and combine them into a fresh image. The resolution
should be adequately high. 

JAVA uses a huge portion of the available main memory. As TreeSnatcher Plus
needs to maintain several copies of the source image in memory, there is a
maximum size for the source image that depends on the memory size of your
machine. You should increase the heap space prior to running the application.
For this, edit and execute the shell script which comes with the Macintosh
and Linux versions of TreeSnatcher Plus, or the batch script for the
Windows version.  

As TreeSnatcher Plus offers no online help or a dedicated manual, please
work through the various tutorials. All necessary steps are illustrated.
Often the same results can also be obtained in a different way.
Additionally we have uploaded some work-in-progress images (Snapshots) 
which you might want to restore from within TreeSnatcher Plus. They 
illustrate how images need to be processed in the application.

The workflow shown in the tutorials might seem cumbersome or unintuitive.
However, image preprocessing is necessary: Neither does the computer know
the notion of lines, line crossings, line endings, line thickness, nor can
it read textual information on its own or grasp an image as a whole.

If you know a better method, please feel free to send me a mail. I will be
happy to hear from you.  


WORKFLOW:

In contrast to TreeSnatcher, the workflow in TreeSnatcher Plus now includes
preprocessing of the image. The order of the global tasks is still mandatory: 

(1) The program reads the specified RGB image.

(2) Preprocess the image
    The user trims and cuts the image. That is to say, the user can select sub-trees
    or a subset of taxa from the converted image.
	
    If necessary, some image preprocessing tools are used (cf. tutorials).

(3) Binarization/Thresholding
    The user thresholds the image to ensure that the foreground is black and the back-
    ground is white

(4) Skeletonization/Thinning
    The user thins the part of the image that contains the tree. This is necessary to
    make it easy for the program to trace the paths between the nodes

(5) Flooding the foreground
    The user marks a foreground location on a branch. The program colors the foreground
    reachable from this location. The flooded area will be the tree. Everything else
    is ignored in subsequent steps.


(6) Placement of inner nodes (line intersections) and outer nodes (tips)
    The program suggests locations for inner and outer nodes of the tree.
    The user can move, remove, and move nodes.

(7) Choosing the tree type
    The program can deal with freeform trees and rectangular trees. In freeform
    topologies, a branch length is measured from tip to line crossing/from line
    crossing to line crossing. In rectangular trees, branch lengths are measured
    from tip to bend, bend to next bend, to line crossing. For this to work,
    the program tries to divide each branch into straight segments and calculates
    their slope. The clearer the bend in a branch, the better the result. In 
    particular, this approach does not currently work well for small branches.
    When using the rectangular tree type, the calculation of branch lengths will
    only be accurate if the program has identified the bends in the branches     
    correctly. Use the view "Show branch length composition" to check this.

    The tree type must be chosen prior to step (8).  
      

(8) Detection of branches
    The program tries to find the branches of the tree by tracing the foreground
    between the node locations. 
    If a branch is wrong, the user sets or erases pixels, moves nodes and then
    restarts this step.


(9) Determining branch lengths
    The branch length in pixels is measured during step 8. Please keep in mind
    that the exactness of the branch length measurement depends on the congruence
    of thinned tree structure, nodes and real tree.

    There are three types of branch lengths in TreeSnatcher Plus: user assigned
    length, calculated length and a mixture of both.  
    
    Calculated length (default): TreeSnatcher Plus first calculates the length of
    a path between two nodes. Then it determines the branch segments. It adds
    the length in pixels of the segments that are length relevant.

    User length: The user can assign a value to a branch. This value remains
    unaltered during the rest of the session.

    Mixed lengths:  Allows a mixture between calculated lengths and user lengths.
    This can become necessary if there is for instance a single branch among
    numerous that has a wrong length. This length can be manually assigned.
    Mixed lengths can also be used for what-if-scenarios.

    The user can also mark a line of known length in the image, i.e. a scale bar. 
    If he does, all calculated lengths are recalculated with respect to
    the new scale.
    It is also possible to set a calculated length to the value 1.0 or a user
    specified value. All calculated lengths are recalculated with respect to
    the new scale.

(10) Assigning species names
    The user clicks on each leaf node in turn in order to type in the
    corresponding species name in a dialog box. This must be done manually as
    there is no OCR done so far.


(11) Choosing the origin
    The user either clicks on an inner node to designate it as the origin of the
    tree, or accepts the default.

(12) Constructing the Newick string
    The program constructs and displays the Newick expression that represents the
    tree structure.

A detailed description of the thinning (skeletonizing) algorithm used in step 4 is
given in Zhang, T.Y. and Suen, C.Y. (1984) A fast parallel algorithm for thinning
digital patterns. Image Processing and Computer Vision 27, 3 (Mar. 1984), 236-239

In steps 4, 5 and 7 several variants on the flood filling technique are used.
Please see Burger, W. and Burge, M.J. (2005) Digitale Bildverarbeitung, Berlin,
Heidelberg, Springer Verlag, 196-200.

==========================================================================================


INSTALLATION:
-------------

Mac OS X

  - The Mac OS X version of TreeSnatcher Plus comes as a single
    ZIP-compressed file called "TreeSnatcherPlus.zip". 
    It should run on all Mac OS X versions that can execute the JAVA VM 1.6.	  

  - Extract the files "TreeSnatcherPlus_June2010_MacOSX.jar", "tsReadme.txt" 
    (this file) and the remaining files from it and copy them into one directory.
 
  - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar
    icon, or execute the shell script TreeSnatcherPlus.sh.
    If the application needs more memory, you should adjust the parameters
    "Xms" and "Xmx" in the shell skript.

Linux

  - The Linux version of TreeSnatcher Plus comes as a single
    ZIP-compressed file called "TreeSnatcherPlus.zip". 
    It should run on all Linux versions that can execute the JAVA VM 1.6.	  

  - Extract the files "TreeSnatcherPlus_June2010_Linux.jar", "tsReadme.txt" 
    (this file) and the remaining files from it and copy them into one directory.
 
  - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar
    icon, or execute the shell script TreeSnatcherPlus.sh.
    If the application needs more memory, you should adjust the parameters
    "Xms" and "Xmx" in the shell skript.

Windows

  - The Windows version of TreeSnatcher Plus comes as a single
    ZIP-compressed file called "TreeSnatcherPlus.zip". 
    It has been tested on Windows XP with the JAVA VM 1.6.	  

  - Extract the files "TreeSnatcherPlus_June2010_Windows.jar", "tsReadme.txt" 
    (this file) and the remaining files from it and copy them into one directory.
 
  - Start TreeSnatcher Plus by either double-clicking the TreeSnatcherPlus.jar
    icon, or execute the batch script TreeSnatcherPlus.bat.
    If the application needs more memory, you should adjust the parameters
    "Xms" and "Xmx" in the batch skript.

  If you encounter problems, please ask your local administrator for help.
  
===================================================================================

REFERENCES:
-----------

   Thomas Laubach and Arndt von Haeseler (2007) TreeSnatcher: Coding Trees From Images,
   Bioinformatics 2007 23(24):3384-3385; doi:10.1093/bioinformatics/btm438 
   
   If you are using TreeSnatcher Plus, please cite this article.
   If you like TreeSnatcher Plus, we will be happy to hear from you.

===================================================================================

CREDITS:
--------

If you have any further questions that are not answered in this introduction or the 
tutorials, please drop me an e-Mail. We welcome any suggestions, criticism, and bug
reports. TreeSnatcher Plus is a complex piece of software and is likely to contain bugs.

I am indebted to the following people for fruitful discussions, support and suggestions:
Martin Lercher, Arndt von Haeseler, Gabriel Gelius-Dietrich, Jochen Kohl, Dominic Mainz, 
Indra Mainz, Ingo Paulsen, Michael Rosskopf, Stefan Zanger, Steffen Klaere, Sabine Thuss,
Na Gao, Wolfgang Kaisers, Guang-Zhong Wang, Janina Mass, Christian Esser, Annika Hoinkes,
Christian Cremer, Marc Andre Daxer, Ulrich Wittelsbuerger, Benjamin Braasch, Jan Wolfertz, 
Rafael Dellen, Claus Jonathan Fritzemeier, Janina Mass, Anna Schlizio, Heiko Schmidt,
and others

In particular I want to thank Andrew Rambaut for his now classic program TreeThief, for
discussing with me the TreeSnatcher Plus approach and for giving me valuable advice.

******************************************************************************** 
	This program is distributed in the hope that it will be useful, but
	WITHOUT ANY WARRANTY; without even the implied warranty of
	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 
********************************************************************************
