Running the MCMC.
WARNING: the MCMC computations can take a long time, we suggest estimation of computational time by setting small number of iterations prior to actual run.
There are two ways to run MCMC: from GUI (recommended) or from MatLab command line (advanced users).
To run MCMC from the GUI you have to select the "Run MCMC for Aminoacid Sequences" string from the main menu.
Then you have to fill out form and press "Start iterations" button.
The fields of the form are:
Input file
The file with alignment, fasta or MAtLab format assumed.
Also the tree file should exist and file with branch
length is desirable. The tree file can be omitted only if
the option "do_tree_jumps" is on.
suffix
Provides additional tagging to the result file names.
With input file name say 'IGl.set' and suffix 'Run5' the
result file will look like 'IGlRun5.*'.
Continue from saved
Allow continuing interrupted computation. It looks for file
with specified input name and suffix and extension 'mat'.
This file created automatically at every 5th iteration.
Iteration number
The desired number of iteration. The more sequences included in
analysis the more iterations required to get smooth distributions.
There are no any rules to define exact number of iterations for
every particular case. Usually it's a tradeoff between desirable
smoothness of distributions and computational time, but no less
1000 iterations required.
Burn in
The number of discarded initial iterations. It takes time for
the MCMC calculations to get to a stable zone, so the values obtained
from first steps aren't consistent and should be rejected. We
suggest this value no less that 100. More specific threshold can be
determined from final likelihood plot.
Substitution model
You have to select appropriate substitution model, which required
to compute likelihood values.
Model selection
We dealing with rate variation profile by using wavelet approximation
as described in articles below. In this case rate variation profile
can be described by multiple models with different number of parameters.
Setting this option 'on' can help find optimal model with optimal number
of parameters for every particular data set.
Uniform rate
Assume uniform rate at every position of alignment. Overrides model
selection option. Can be used if comparatively quick estimations of
branches length or estimation of tree topology required.
Do random start
Instead of using provided parameters (branch length and, if tree jumps
are allowed, tree topology) generates a random set of parameters.
Do tree jumps
Required for estimation of tree topology. You can set the frequency
of attempts to switch to a new topology and topological distance
from old topology to one.
delta_a, delta_d, p_forward and q_back
Parameters of random walk in the parameter space. The shape and
parameters estimations don't depends of this settings, but the
smoothness of resulting distribution can be affected.
For the details on MCMC algorithm you can read the following articles:
Rzhetsky, A., and P. Morozov., 2001. Markov chain Monte Carlo computation
of confidence intervals for substitution-rate variation in proteins.
Pacific Symposium on Biocomputing 6:203-214.
(http://genome6.cpmc.columbia.edu/~andrey/psb2001.pdf)
Morozov, P.S., T.L. Sitnikova,G. Churchill , F.J. Ayala , and A. Rzhetsky.,
2000. A New Method for Characterizing Replacement Rate Variation in
Molecular Sequences: Application of the Fourier and Wavelet Models
to Drosophila and Mammalian Proteins. Genetics 154:381-395.
(http://genome6.cpmc.columbia.edu/~andrey/Morozov_2000.pdf)