Diffusion FreeBSD ports repository rP468297

biology/paml: Update to version 4.9h
rP468297
Actions

Description

biology/paml: Update to version 4.9h

Port Changes:

Add EXAMPLES option and install most files under EXAMPLESDIR
Add USES=dos2unix to fix DOS line endings

Upstream Changes:
Version 4.9h, March 2018

(*) mcmctree: gamma-Dirichlet versus conditional i.i.d. priors for
rates for loci. Since 4.9d, the program and the documentation are
inconsistent about the two priors, and which value (0 or 1) means
which prior. I have now checked the program and the documentation to
make sure that they are consistent:

prior = 0: gamma-Dirichlet (dos Reis 2014). This is the default.
prior = 1: conditional i.i.d. prior (Zhu et al. 2015).

I believe these two are similar especially if the number of loci
(partitions) is large, but no serious comparisons between the two
priors have been published.

Thanks to Adnan Moussalli for pointing out the errors.

(*) codeml. It was discovered that the mechanistic amino acid
substitution model implemented in Yang et al. (1998; see table 3),
specified by seqtype = 2 model = 6, has been broken for a long time,
since version 3.0 (2000) at least. Version 2.0 (1999) seems to be
correct. This means that the model become broken soon since it was
published. I have now fixed this.

This model of amino acid substitution starts from a Markov chain for
codons and then aggregate the states and merge the synonymous codons
into one state (the coded amino acid). This is an approximate
formulation since the process after state aggregation is not Markovian
anymore.

I have now added another codon-based amino acid substitution model
that treats amino acids as ambiguities codons. The model is specified
by seqtype = 2 model = 5. This is an exact formulation.

(*) codeml. The number of categories in the BEB calculation under M2
and M8 is unintentionally set to 4 rather than 10. I have changed
this back to 10. The details of this calculation are in Yang et
al. 2005 MBE.

Version 4.9g, December 2017

(*) codeml. A bug caused the BEB calculation under the site model M8
(NSsites = 8) to be incorrect, with the program printing out warming
messages like "strange: f[ 5] = -0.0587063 very small." This bug was
introduced in version 4.9b and affects versions 4.9b-f. A different
bug was introduced in version 4.9f that causes the log likelihood
function under the site model M8 (NSsites = 8) to be calculated
incorrectly. These are now fixed.

Version 4.9f, October 2017

(*) baseml, nonhomogeneous models (nhomo & fix_kappa). Those models
allow different branches on the tree to have different Q matrices.
Roughly nhomo controls the base frequency parameters while gix_kappa
controls kappa or the exchangeability parameters (a b c d e in
GTR/REV, for example). I added the option (nhomo = 5, fix_kappa = 2),
which lets the user to define branch types, so that branches of the
same type have the same exchangeability parameters (a b c d e for GTR)
and base composition parameters, while branches of different types
have different parameters. Branch types are labeled (using # and $),
0, 1, 2, .... The labels should be consecutive positive integers.
The old options nhomo = 3 or 4 work for some models like GTR, but not
some other models which also have base composition parameters. In
this update, I think those options should work with all those models.
I have also edited the documentation (look for option variable nhomo
for baseml).

(*) baseml & codeml. i added an option fix_blength = 3
(proportional), which means that branch lengths will be proporational
to those given in the tree file, and the proportionality factor is
estimated by ML.

(*) codeml. The program does not count the parameters correctly for
model M0 when fix_kappa = 1. The bug was introduced in version 4.9c
and affects versions 4.9c-e. This is now fixed.

(*) codeml (seqtype = 2 model = 2). If you are analyzing multiple
protein data sets (ndata > 1) under the empirical models such as wag,
jtt, dayhoff. The results for the first data set are correct, but all
later data sets are analyzed incorrectly under the corresponding +F
models, that is, wag+F, jtt+F, dayhoff+F, etc. A bug in the program
means that for the second and later data sets, the equilibrium amino
acid frequencies are taken from the real data and not correctly set to
those specified by the empirical models. I note that this bug was
recorded in the update Version 3.14b, April 2005, but it was somehow
not fixed, even in that version. This is now fixed. Thanks to Nick
Goldman for reporting this again.

(*) evolver (options 5, 6, 7 for simulating nucleotide, codon and
amino acid alignments). If you choose the option of printing out the
site pattern counts instead of the sequences (specified at the
beginning of the control file such as MCbase.dat), and if you are
simulating two or more alignments, the program crashes after finishing
the first alignment. This is now fixed.

(*) mcmctree. The program crashes if you have a mixture of
morphological loci and molecular loci, if not all the morphological
loci are before the molecular loci. I have now fixed this.
I think this was never described anyway.

Version 4.9e, March 2017

(*) Edited the readme files to change the license to GPL.

(*) mcmctree. A bug was introduced in version 4.9b which causes the
program to read the fossil calibration information in the tree file
incorrectly, if joint (minimum and maximum) bounds are specified using
the symbol '<' and '>'. If you use the notation "B()", "L()", and
'U()', the information is read correctly. This bug was introduced in
version 4.9b and exists in 4.9c and 4.9d. Versions 4.9a and earlier
were correct.

Version 4.9d, February 2017

(*) mcmctree. Changed the default prior for rates for loci to
gamma-Dirichlet (dos Reis 2014), and updated the documentation as
well. It was set to the conditional i.i.d. prior (Zhu et al. 2015).

(*) mcmctree. Added Bayes factor calculation. A program called
BFdriver is included in the release, as well as a pdf document in the
folder examples/DatingSoftBound/BFdriverDOC.pdf. We suggest that you
use the exact likelihood calculation when you use this option, since the
normal approximation is unreliable when the power posterior is close to
the prior (when beta is small).