General:
--------
parat estimates site specific substitution rates from a set of DNA sequences. The rates and the phylogenetic tree relating the sequences are estimated in an iterative maximum likelihood procedure, whereby the likelihood of the inferred tree increases at each iteration step until it converges.
Rates can be estimated from data sets containing up to 100 sequences in one iteration procedure. For the analysis of datasets larger than 100 sequences, a subsampling procedure is implemented in parat. Here, random subsamples are drawn from the sequence set, rates are estimated from the subsamples with the iterative procedure and finally the rate estimates are averaged.
The iteration procedure yields a file called rates and, if chosen, a file called outtree.
For more details about the method and its efficiency see:
Sonja Meyer and Arndt von Haeseler (2003)
Identifying Site-Specific Substitution Rates
Mol. Biol. Evol. 20(2):182-189
If you want to use parat in a publication, please cite our paper.
Some explanations about the options
-----------------------------------
Input parameter
To run the program you need a file of aligned sequences in phylip format. To open the sequence file choose File->Open from the menubar. Please note, that the old treepuzzle version, which parat is based on, gets confused, if a file named infile is contained in the current working directory, i.e. the directory in which parat was started.
The Phylip format looks like this (10 characters for the sequence name followed by the sequence):
5 40
Seq1 ATTAGTCATCGCCGTATTAGCATTCCGAGATCTAACCCCC
Seq2 ATTAGTCGCTTGCGCACTAGCCTTCCGAAATCTAGCCCCC
Seq3 ACTGTTTACTGAGCTACTAGCCTTCCCGAATCTAGCCCCC
Seq4 ATTAGTCGCTAATTTCCTAGCATTCCCGAATCTGGCCCCC
Seq5 ATTAGTCGCTAATTTCCTAGCATTCCCGAATCTGGCCCCC
Control parameters
------------------
Stop Iteration:
This option allows the user to decide upon a stop criterion for the iteration. The user can select to stop the iteration when the difference of the likelihood values from two consecutive iteration steps is less than a specified value, for example 1.0.
Alternativly, the user may stop the iteration after a certain number of iteration steps, for example 10.
In any case, iteration process stops if the actual likelihood values is worse than the one before.
Use subsample mode:
This option allows the analysis of data sets larger than 100 sequences. The user specifies the number of the random samples that are drawn from the sequence file (number of subsamples) as well as the number of sequences in each subsample (samplesize).
Please note, that rate estimation depends on the number of sequences. In order to obtain reasonable results, sample size should be as big as possible (although this is time consuming ). Estimation of rates from data sets smaller
than 25 sequences is not advisable at all (see our paper for details).
Save the last tree file:
Using this option the user may choose to save the last outtree file computed by the treepuzzle program. In combination with subsampling, a recomputation of the tree based on the average rates is performed.