GENEALOGY-DNA-L ArchivesArchiver > GENEALOGY-DNA > 2003-11 > 1069386143
Subject: Re: [DNA] Genetic Distance calculation method -- which method ismost correct?
Date: Thu, 20 Nov 2003 22:42:34 -0500 (EST)
References: <OF1A2B197A.A0F0DD49-ON85256DE3.007525EA-85256DE3.0078EBB1@downstate.edu> <REME20031119183221@alum.mit.edu> <3FBD7168.email@example.com>
In-Reply-To: <3FBD7168.firstname.lastname@example.org> (message from Charles on Thu, 20Nov 2003 20:59:04 -0500)
Charles wrote, quoting Bennett quoting Bruce:
> What is the "equivalent" number of single mutations for individuals off
> by two? Microarrays mutate by the stepwise mutation model, wherein a
> mutation can move the number up one or down one. Hence, a rather formal
> statistical model has to be used to account for the actual number of
> mutations when individuals differ by (say) two mutations. Analysis of
> this model, which is straightforward but a little complex (it involves
> type II bessel functions, feel free to email me for details), shows that
> the expected number of actual mutations for individuals that are off by
> two is roughly 2.1.
> Hence, the correct equivalent number of single mutations is essentially
> 2, not 2*2 =4.
Unfortunately, that result is nonsense. The example is, in fact,
simple enough to explain to the whole list and requires no Bessel
functions. Consider two individuals who have between them actually
experienced exactly two mutations relative to a common ancestor. We
can neglect the bias toward increases because we are looking at the
difference between two individuals (who both would have the same
bias). Therefore, we have two equally likely cases: either the two
mutations canceled each other out, giving an observed difference of 0,
or the two mutations reinforced, giving an observed difference of 2.
We take the (equally) weighted root-mean-square of these two values
and get an expected observed difference of 1.4 when there are exactly
two mutations. (Actually, it's 1.414213..., i.e., the square-root of
In short, if the observed difference is 2, the expected number of
mutations has to be considerably more than 2.1. How much more?
Well, let's take a wild guess and examine the case of 4 mutations.
Obviously, it's a little more complicated if there are exactly 4
mutations, since there are three possible outcomes (observed
differences of 0, 2, and 4), and these three outcomes are not
equally likely: 38% 0, 50% 2, and 12% 4. (The interested reader
can easily confirm these percentages by writing down all the
possible combinations of pluses and minuses.) Anyhow, it is easy
to see that the observed difference strongly favors the "2" case,
and a quick calculation verifies that the RMS value is exactly 2.
Therefore, the equivalent number of single mutations is, in fact,
4, as I have always said.
I don't think any more needs to be said about that.
There is, however, more to the story. The stepwise model does NOT
give a correct picture because it doesn't allow for two-step
mutations. The full and complete answer has to allow for those and
has to be informed of the exact relative probabilities of one-, two-,
and three-step mutations (which we do not know with any precision).
On the other hand, since we ALSO do not know the absolute rate of
one-step mutations, this whole aspect can be swept under the same rug,
and we can define something called the "effective" mutation rate which
includes the small contributions of double and triple mutations. In
the end, then, we have the same simple rule of summing the squares of
the differences, and just a little footnote attached to the
still-fuzzy average mutation rate.
|Re: [DNA] Genetic Distance calculation method -- which method ismost correct? by|