Running the MCMC.


WARNING: the MCMC computations can take a long time, we suggest estimation of computational time by setting small number of iterations prior to actual run.


There are two ways to run MCMC: from GUI (recommended) or from MatLab command line (advanced users).

To run MCMC from the GUI you have to select the "Run MCMC for Aminoacid Sequences" string from the main menu.

Then you have to fill out form and press "Start iterations" button.

The fields of the form are:

   Input file 
     The file with alignment, fasta or MAtLab format assumed.
     Also the tree file should exist and file with branch 
     length is desirable. The tree file can be omitted only if 
     the option "do_tree_jumps" is on.

   suffix
     Provides additional tagging to the result file names.
     With input file name say 'IGl.set' and suffix 'Run5' the
     result file will look like 'IGlRun5.*'.

   Continue from saved
     Allow continuing interrupted computation. It looks for file
     with specified input name and suffix and extension 'mat'.
     This file created automatically at every 5th iteration.

   Iteration number
     The desired number of iteration. The more sequences included in 
     analysis the more iterations required to get smooth distributions.
     There are no any rules to define exact number of iterations for 
     every particular case. Usually it's a tradeoff between desirable
     smoothness of distributions and computational time, but no less
     1000 iterations required.

   Burn in
     The number of discarded initial iterations. It takes time for 
     the MCMC calculations to get to a stable zone, so the values obtained
     from first steps aren't consistent and should be rejected. We 
     suggest this value no less that 100. More specific threshold can be 
     determined from final likelihood plot.

   Substitution model
     You have to select appropriate substitution model, which required
     to compute likelihood values.

   Model selection
     We dealing with rate variation profile by using wavelet approximation
     as described in articles below. In this case rate variation profile 
     can be described by multiple models with different number of parameters.
     Setting this option 'on' can help find optimal model with optimal number
     of parameters for every particular data set. 

   Uniform rate
     Assume uniform rate at every position of alignment. Overrides model
     selection option. Can be used if comparatively quick estimations of 
     branches length or estimation of tree topology required.

   Do random start
     Instead of using provided parameters (branch length and, if tree jumps 
     are allowed, tree topology) generates a random set of parameters.  

   Do tree jumps 
     Required for estimation of tree topology. You can set the frequency 
     of attempts to switch to a new topology and topological distance 
     from old topology to one.

   delta_a, delta_d, p_forward and q_back
     Parameters of random walk in the parameter space. The shape and 
     parameters estimations don't depends of this settings, but the 
     smoothness of resulting distribution can be affected.


 For the details on MCMC algorithm you can read the following articles:
 
 Rzhetsky, A., and P. Morozov., 2001. Markov chain Monte Carlo computation 
 of confidence intervals for substitution-rate variation in proteins. 
 Pacific Symposium on Biocomputing 6:203-214.
 (http://genome6.cpmc.columbia.edu/~andrey/psb2001.pdf)

 Morozov, P.S., T.L. Sitnikova,G. Churchill , F.J. Ayala , and A. Rzhetsky.,
 2000. A New Method for Characterizing Replacement Rate Variation in 
 Molecular Sequences: Application of the Fourier and Wavelet Models 
 to Drosophila and Mammalian Proteins. Genetics 154:381-395.
  (http://genome6.cpmc.columbia.edu/~andrey/Morozov_2000.pdf)