********************************************************************************
********************************************************************************
****                                                                        ****
****   IQPNNI: Important Quartet Puzzling and Nearest Neighbor Interchange  ****
****                                                                        ****
****                            User Manual                                 ****
****                      Version 3.1 (March 2007)                          ****
****                                                                        ****
********************************************************************************
********************************************************************************


Copyright (C) 2005-2007 by  Le Sy Vinh, Bui Quang Minh, Heiko A. Schmidt, and
                            Arndt von Haeseler
Copyright (C) 2004 by       Le Sy Vinh and Arndt von Haeseler

This program is free software; you can redistribute it and/or modify 
it under the terms of the GNU General Public License Please refer to 
http://www.opensource.org/licenses/gpl-license.html for details.

================================================================================

The methods are described in detail in the following articles:
    Le Sy Vinh and Arndt von Haeseler (2004)
    IQPNNI: Moving fast through tree space and stopping in time, 
    Mol. Biol. Evol. 21(8):1565-1571
    http://dx.doi.org/10.1093/molbev/msh176dd

    Bui Quang Minh, Le Sy Vinh, Arndt von Haeseler, and Heiko A. Schmidt (2005)
    pIQPNNI - parallel reconstruction of large maximum likelihood phylogenies.
    Bioinformatics 21(19):3794-3796
    http://dx.doi.org/10.1093/bioinformatics/bti594
    
================================================================================

Main Contributors
    Le Sy Vinh
    NIC, Forschungszentrum Juelich, Germany
    vinh(AT)cs.uni-duesseldorf.de

    Arndt von Haeseler
    Center for Integrative Bioinformatics Vienna
    Max F. Perutz Laboratories, Austria
    arndt.von.haeseler(AT)mfpl.ac.at

    associate member of
    Bioinformatics Institute, Heinrich Heine University Duesseldorf, Germany

    Heiko A. Schmidt
    Center for Integrative Bioinformatics Vienna
    Max F. Perutz Laboratories, Austria
    heiko.schmidt(AT)mfpl.ac.at    

    associate member of
    NIC, Forschungszentrum Juelich, Germany
    
    Bui Quang Minh
    Center for Integrative Bioinformatics Vienna
    Max F. Perutz Laboratories, Austria
    minh.bui(AT)mfpl.ac.at

================================================================================

SHORT DESCRIPTION:
------------------

IQPNNI is a computer program to reconstruct the evolutionary relationships 
among contemporary species based on DNA, protein, or protein-coding sequences. 
In case of protein-coding sequences, several codon models are implemented for
inferring positive selection.

IQPNNI is a command-line and menu-driven program which 
allows users to specify the parameter values or let the program estimate them 
from the input data (a nucleotide or amino acid alignment in PHYLIP format). 
The options are classified into four main groups, general options, IQP options, 
substitution process options, and rate heterogeneity options.


================================================================================

VERSION HISTORY: 
----------------

VERSION 3.1:
    1. Codon model: The program goes through two stages. At first the tree is
       reconstructed based on HKY model for DNA. Then it applies one of the 
       following codon models for inference of positively selected sites:
       - NY98 (Nielsen & Yang 98): main model for inferring positive selection.
       - YN98 (Yang & Nielsen 98): special case of NY98 with 1 Ns/Sy category.
       - GY98 (Goldman & Yang 94).
       - CP98 (Pedersen et al. 98): model incorporating CpG depression.
       - CGTR: GTR version of nucleotide for codon (unpublished).
    2. Gamma + Invariable sites rate heterogeneity.
    3. Site-specific rates (Meyer & von Haeseler 2003) improved. Also write out
       site-rates based on empirical bayesian if gamma rate is specified.
    4. New protein models: rtREV (Dimmic et al. 2001), user-defined model by 
       a file containing amino-acid replacement rates and frequencies.
    5. Warning if number of iterations is too small as recommended by the 
       stopping rule.
    6. New command line options.

Bugs fixed:
    - Zero state frequencies: they are now replaced by a very small number.
    - Checkpoint: now correctly recovered from stopped point.
    - Restriction on number of sites: from limit 100,000 to unlimited now.
Bugs identified:
    - Parallel version on Infiniband system under MPICH.


VERSION 3.0.1:
    1. Zero iteration: if user specifies number of iterations to be zero, 
       the program will only evaluate the starting tree (either BIONJ or 
       user-defined tree) by optimizing model paramters and branch lengths.
    2. Triplet tree: the program can now run on alignment of just 3 sequences.
    3. Scaling technique to avoid numerical underflow on large datasets. It now
       can stably analyze alignments with more than 1,000 sequences.
    4. At least twice faster than v3.0. The "long double" datatype is replaced 
       by "double", making it more compatible to most computers.
    5. Memory consumption is reduced at least by half by a new mechanism of 
       storing conditional likelihood vector.
    6. New eigensystem adapted to reversible instantaneous rate matrix.


VERSION 3.0.b1:
    1. The program now runs at least twice faster (applying Newton's method 
       instead of Brent's algorithm and some other algorithmic means).
    2. Running in Parallel with Message Passing Interface (MPI).
	      
*NOTE*: - The option to change rate heterogeneity is now 'r' instead of 'w'. 
	- The stopping rule is now switched off by default, which can be 
          changed using the 's' option.
	

VERSION 2.6:
    1. General Time Reversible model of evolution.
    2. Site-specific substitution rates.
    3. Check point: If the program was crashed or stopped by users, it can 
       continue from the last stopped point.
                                 

================================================================================

COMMAND-LINE OPTIONS:
---------------------

Syntax: iqpnni [OPTIONS] [Filename]

GENERAL OPTIONS:
  -h, -?               print this help dialog
  -n <iteration_count> make the main loop to no more than iteration_count
  -s <stopping_rule>   either on or off; defaut is off
  -u <user_tree>       read the starting tree from user_tree file
  -sfc                 start from scratch, don't load the check point file
  -ni                  don't prompt for user option
	    
IQP OPTIONS:
  -p <probability>     set the probability of deleting a sequence
  -k <representatives> set the number of representatives
		
MODEL OPTIONS:
  -m <model>           set the model type for:
	Nucleotides: HKY85, TN93, GTR
	Amino acids: WAG, Dayhoff, JTT, VT, mtREV, rtREV, Blosum
	Protein-coding DNA: GY94, YN98, NY98, CP98, CGTR

RATE HETEROGENEITY OPTIONS:
  -w <rate_type>       either uniform, gamma, igamma or sitespec
  -c <num_rate>        number of rate categories, for gamma and igamma only


================================================================================

MENU OPTIONS:
-------------

IQPNNI used a text-based menu-driven interface like:

  GENERAL OPTIONS
   o                        Display as outgroup? A-14-133
   n                       Number of iterations? 88
   s                              Stopping rule? No

  IQP OPTIONS
   p         Probability of deleting a sequence? 0.5
   k                     Number representatives? 4

  SUBSTITUTION PROCESS
   d                Type of sequence input data? Nucleotides
   m                      Model of substitution? HKY85 (Hasegawa et al. 1985)
   t                 Ts/Tv ratio (0.5 for JC69)? Estimate from data
   f                           Base frequencies? Estimate from data

  RATE HETEROGENEITY
   r                Model of rate heterogeneity? Uniform rate

  quit [q], confirm [y],or change [menu] settings: 


In the following the available options will be briefly introduced:

GENERAL OPTIONS
  The option 'o': Users can specify a sequence as the outgroup sequence. 
    The final tree with the highest likelihood will be rooted with respect to 
    the outgroup sequence.


  The option 'n': Users can specify the number of iterations or use 
    the default value. 

  
  The option 's': Users can choose one of four possibilities to stop the program.
    1. The first possibility is
          "s   Stopping rule? No"
       It means that the program will stop after 'n' iterations.
       
    2. The second possibility is  
          "s   Stopping rule (if applicable)? Yes, but at least 'n' iterations"
       It is similar to the fourth possibility, but the program will run at 
       least 'n' iterations.

    3. The third possibility is
          "s   Stopping rule (if applicable)? Yes, but at most 'n' iterations"
       It is similar to the fourth possibility, but the program will run at 
       most 'n' iterations.

    4. The last possibility is 
           "s       Stopping rule (if applicable)? Yes"
       It means that the program will stop and output the optimal tree with
       95% confidence if at least three better trees found during the search, 
       otherwise it will stop after 'n' iterations.


IQP OPTIONS
  The option 'p': Users can specify the probability of deleting a sequence
                  or let the program estimate it from the input data.
                  Note that, when the sequence length is very long
                  users should increase the value of p and try different runs
                  with various choices of p.

  The option 'k': One can specify number of representatives leaves
                  for a rooted tree. However, we strongly recommend to use
                  the default value.


THE SUBSTITUTION PROCESS
  If the input data is nucleotide the program can work with JC69 (Juke and
  Cantor, 1969), K2P (Kimura's 2-Parameter model, 1980), F81 (Felsenstein, 
  1981), and HKY85 (Hasegawa et al., 1985), TN93 (Tamura and Nei, 1993) and 
  General Time Reversible models (GTR, e.g., Tavare, 1986) of evolution. 
  For amino acids, the following models are available: Dayhoff (Dayhoff 
  et al., 1978), JTT (Jones et al., 1992), VT (Mueller and Vingron, 2000), 
  mtREV (Adachi and Hasegawa, 1996), WAG (Whelan and Goldman, 2000). 
  The BLOSUM62 matrix by Henikoff and Henikoff (1992) should better not be 
  used for phylogenetic reconstruction, because it was constructed for 
  database searches and does not reflect an evolutionary process.

  The option 'd': Users must specify the type of sequence input data:
                  1. Nucleotides or
                  2. Amino acids.

  The option 'f': Users can specify the base frequencies or let the
                  program estimate them from the input data.

  The option 't': If HKY85 or TN93 model are chosen, one can specify
                  the transition/transversion ratio (between 0.2 and 32.0) 
                  or let the program estimate it from the input data 
                  (default).

  The option 'u': For the TN93 model one can also enter the py/pu ratio
                  (the ratio of pyrimidine transition rate to purine 
                  transition rate) between 0.2 and 32.0, or let the program 
                  estimate it from the input data (default).

  The option 'g': If users choose General Time Reversible model,
                  they can specify six different rate parameters:
                      1. Transversion rate from A to C, 
                      2. Transition   rate from A to G,
                      3. Transversion rate from A to T,
                      4. Transversion rate from C to G,
                      5. Transition   rate from C to T, 
                      6. Transversion rate from G to T, 
                  or let the program estimate them from the input data.


RATE HETEROGENEITY
  The program can also assume rate heterogeneity. Users can either choose 
  uniform rate over all sites (rate homogeneity, default), site-specific 
  substitution rates (cf. Sonja Meyer and Arndt von Haeseler, Identifying 
  Site-Specific Substitution Rates, Mol. Biol. Evol. 20(2).2003), or 
  Gamma distributed rates. 

  The option 'r': To switch among 3 types: Uniform rate, Gamma distributed rate
		  site specific rate.

  The option 'a': If users choose Gamma distributed rate, they can specify 
                  the Gamma distribution shape parameter alpha (between 0.1 
                  and 100.0) or let IQPNNI program estimate it from the 
                  input data (default). 

  The option 'c': If users choose Gamma distributed rates, they can specify 
                  a number of Gamma rate categories between 2 and 32. The 
                  default is 4 categories.


================================================================================

INSTALLATION:
-------------

  See below for information how to install/build the different
  versions of the IQPNNI software. Executable versions of the sequential,
  that is, non-parallel program are intended for a number of operating 
  systems. The parallel program (pIQPNNI) has to be build from the 
  sources, as is the sequential program if a binary release does not
  exist for you operating system.


  Sequential Version - Binary release:
  ------------------------------------

    1) You might want to download the executable version of IQPNNI
       for your operating system if it is available (iqpnni-XXX-OS.tar.gz 
       or iqpnni-XXX-OS.zip, where XXX is the current version number and 
       OS the operating system) from its web page 
       <http://www.bi.uni-duesseldorf.de/software/iqpnni>.
    2) Extract the files (e.g., with tar xvzf 'iqpnni-XXX-OS.tar.gz' under Unix)
       This should create a directory iqpnni-XXX.
    3) You will find the executable in iqpnni-XXX/src 
       This executable you should rename to 'iqpnni' (or 'iqpnni.exe
       on Windows systems) and copy it to your system's search path
       such that it is found by your system.

    If you encounter problems, please ask your local administrator for help.

  Sequential Version - Source package:
  ------------------------------------

    To build IQPNNI from the sources you need a functional C++ compiler
    installed (This is usually the case on UNIX/Linux systems. For 
    Windows you might want to obtain CygWin or XCode for MacOSX). 
    Then you can follow the procedure below:

    1) Download the current version of the software (iqpnni-XXX.tar.gz or
       iqpnni-XXX.zip, where XXX is the current version number) from its 
       web page <http://www.bi.uni-duesseldorf.de/software/iqpnni>.
    2) Extract the files (e.g., with tar xvzf 'iqpnni-XXX.tar.gz' under Unix)
       This should create a directory iqpnni-XXX.
    3) Change into this directory.
    4) To compile the program, type the following:

         ./configure

       This should configure the package for the build. You might also 
       want to refer to the INSTALL file for more (general) details.

         make

       This compiles and builds the executable 'iqpnni'
       (or 'iqpnni.exe' on Windows systems) to be found in the 'src'
       directory.  This executable can copied to your system's search path
       such that it is found by your system or it can be installed
       to the default destination (e.g., /usr/local/bin on UNIX/Linux) using 

         make install

    If you encounter problems, please ask your local administrator for help.
  
  Parallel Version - Binary release:
  ----------------------------------

    There will be no binary version of the parallel program because 
    it depends on the MPI library you have installed locally.

  Parallel Version - Source package:
  ----------------------------------

    To build the MPI-parallel version of IQPNNI (pIQPNNI) you need a 
    functional C++ compiler installed (This is usually the case on 
    UNIX/Linux systems. For Windows you might want to obtain CygWin or 
    XCode for MacOSX). In addition you have to install an implementation
    of the MPI (Message Passing Interface) library. There is a list of
    (free) implementations at http://www.lammpi.org/mpi/implementations/
    available.

    Then you can follow the procedure below:

    1) Download the current version of the software (iqpnni-XXX.tar.gz or
       iqpnni-XXX.zip, where XXX is the current version number) from its 
       web page <http://www.bi.uni-duesseldorf.de/software/iqpnni>.
    2) Extract the files (e.g., with tar xvzf 'iqpnni-XXX.tar.gz' under Unix)
       This should create a directory iqpnni-XXX.
    3) Change into this directory.
    4) To compile the program, you have to run the configure script with
       the environment variable CXX set to the MPI-C++ compiler of your
       local MPI implementation and turn on the preprocessor directive 
       PARALLEL, e.g.

         env CXX=mpiCC CXXFLAGS="-DPARALLEL -O2" ./configure

       This should configure the package for the build using mpiCC as the
       C++ compiler. You might also want to refer to the INSTALL file for 
       more (general) details.

         make

       This compiles and builds the executable 'iqpnni'
       (or 'iqpnni.exe' on Windows systems) to be found in the 'src'
       directory. This executable should be renamed to 'piqpnni' and
       copied to your system's search path such that it is found by 
       your system.

    5) To run the parallel version please refer to the documentation of your
       locally installed MPI implementation and/or ask your local system
       administrator.

    If you encounter problems, please ask your local administrator for help.
  
================================================================================

REFERENCES:

   to be added

================================================================================

CREDITS:
--------

Some parts of the code were taken from TREE-PUZZLE package:
   Heiko A. Schmidt, Korbinian Strimmer, Martin Vingron, and Arndt von Haeseler,
   (2002) TREE-PUZZLE: Maximum likelihood phylogenetic analysis
   using quartets and parallel computing, Bioinformatics, 18:502-504

The source code to construct the BIONJ tree were taken from BIONJ software:
   Oliver Gascuel (1997) BIONJ: An Improved Version of the NJ Algorithm
   Based on a Simple Model of Sequence Data, Mol. Bio. Evol., 14:685-695


******************************************************************************** 
*    This program is distributed in the hope that it will be useful, but       *
*    WITHOUT ANY WARRANTY; without even the implied warranty of                *
*    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU          *
*    General Public License for more details.                                  *        
********************************************************************************