**GENEALOGY-DNA-L Archives**

From:Subject:Re: [DNA] Genetic Distance calculation -- message from Bruce Walsh. He asked me to post it to the listDate:Fri, 21 Nov 2003 21:18:51 -0500 (EST)References:<3FBE77A0.2050808@kerchner.com>In-Reply-To:<3FBE77A0.2050808@kerchner.com> (message from Charles on Fri, 21Nov 2003 15:37:52 -0500)Bruce wrote:

> My point is that, to do so we need to follow a formal probability model,

> otherwise our intuition can be misleading.

Just so. Beware of hidden assumptions in the formal probability model.

> John's point is correct in that, GIVEN we know how many mutations have

> occurred, if 2 have occurred then a match or an off-by two are equally

> likely. However, the problem is not this, but rather the opposite:

> given an

> observed state (say an observed difference of two), how many actual

> mutations have occurred?

This is a fair statement of the problem, but this is also where we

part company. Bruce takes this statement of the problem as a license

to impose prior knowledge of the distribution of the actual distance

in time between the two test subjects. The knowledge he chooses to

impose is simply that the two test subjects are in fact closely

related. Given that knowledge, it should come as no surprise that his

calculation results in a small genetic distance. (To be fair, I must

point out that "close" in his terms means that the expected number of

mutations is less than one for each locus.) All I can say is that I

don't have the advantage of knowing in advance that any two people are

closely related, even if they happen to have the same surname. If the

genealogical research suggests that they are close, but the DNA

testing shows them to be surprisingly far apart, then I have to take

into account the possibility of a non-paternal event.

To put it another way: I have no quarrel with Bruce's formula as a

formula, but it presumes to know in advance the thing that we are all

trying to discover, namely, the closeness of relationship between two

test subjects. In fact, as he points out, by extending the summation

to larger and larger allowable numbers of generations, he can get as

large a genetic distance as he pleases. The fact that he chooses to

get 2.1 is just the result of his own arbitrary choice.

Here is a description that shows why the difference is not the

quantity to consider. Unfortunately, it involves probability

theory, and so some readers may not be prepared to wade through

it. Still, it has the advantage of putting my assumptions out

in the open. (If you want to skip over it, just look for the next

quoted passage.)

1. The outcome of each mutation opportunity is an independent, random

event with possible values of +1, 0, and -1. I assume +1 and -1 are

equally likely (because we are concerned with the difference between

two people).

2. After some generations have elapsed, the outcome is just the

sum of the individual outcomes; the difference between the two

people is, in turn, the sum of the one outcome and the negative

of the other. (I phrase it in this odd way because of the next

point.)

3. Two simple theorems from probability: the variance of the negative

of a random variable is equal to the variance of the variable itself;

and the variance of the sum of two or more random variables is equal

to the sum of the variances of the individual variables.

4. This means that the expectation of the variance of the offset

(considered as a random variable) between two persons grows linearly

with time from the point of departure from their MRCA. In other

words, the genetic distance (counted as the number of generations) is

proportional to the expected variance.

5. The only concrete estimate we have of the expected variance is the

actually observed variance. (This is where I make my shakiest

assumption -- since we have only one measurement of the variance,

there is no guarantee that it even comes close to the expected value.

Still, it avoids the necessity of circular reasoning.)

6. By the way, the variance is the square of the difference.

> A simple example can make the case: Suppose very few generations have

> passed, but we still see a two-step difference. It is FAR more likely that

> only two mutations have occurred (both in the plus direction) than the

> much more unlikely event of four mutations.

Bruce is explicit here. He comes right out and asserts his assumption

that very few generations have passed. What he is glossing over is

the corrollary that (by his assumption) even just TWO mutations are

very unlikely. In other words, the case he is really building is that

this is probably a LAB ERROR. (Actually, although we have excluded

multi-step mutations from consideration for the sake of argument, the

object of this exercise is to find a description that approximates

reality. Therefore, the answer of choice in this case would be a

two-step mutation.)

As I have said many times before, and as I'm sure Bruce would agree,

the case of a 24/25 match with a two-step difference on the 25th

marker is special. The fact that all the other loci give a hint of a

close relationship does indeed support Bruce's assumption. And

reality does intervene to support the notion that the difference

should perhaps be viewed as a two-step mutation. On the other hand,

let's look at the case that we were talking about just yesterday: the

match was 19/25, and the differences were 2, 2, 2, 1, 1, and 1. Under

the circumstances, it would be absurd to assume the two individuals

are closely related. The sum of the squares is the only reasonable

approximation to the genetic distance in this case.

John Chandler

**This thread:**

- [DNA] Mutation rate and distant ancestors by "Nicholas Penington" <>
- Re: [DNA] Mutation rate and distant ancestors by
- Re: [DNA] Genetic Distance calculation method -- which method ismost correct? by Charles <>
- Re: [DNA] Genetic Distance calculation method -- which method is mostcorrect? by "Nicholas Penington" <>

- [DNA] Genetic Distance calculation -- message from Bruce Walsh. He asked me to post it to the list by Charles <>

- Re: [DNA] Genetic Distance calculation -- which method is best by
- Re: [DNA] Genetic Distance calculation - Comments re MacGregor and a further question by "Richard McGregor" <>

- Re: [DNA] Genetic Distance calculation - Comments re MacGregor anda further question by (VON HAMRICK)

- Re: [DNA] Genetic Distance calculation method -- which method ismost correct? by Charles <>

- [DNA] mutation rate and distant ancestors correction by "Nicholas Penington" <>

- RE: [DNA] Mutation rate and distant ancestors by "Mike Harper" <>

- [DNA] Re:Mutation rate and distant ancestors by "Nicholas Penington" <>

- Re: [DNA] Mutation rate and distant ancestors by