GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2009-12 > 1260747191


From: "Ken Nordtvedt" <>
Subject: Re: [DNA] R-U152 and R-L21 on the European Continent
Date: Sun, 13 Dec 2009 16:33:11 -0700
References: <200912132315.nBDNFGSW029504@mail.rootsweb.com>


----- Original Message -----
From: "Tim Janzen" <>
The basic formula I using for the 95% confidence
> interval in generations is this: 2*G/SQRT(2*G*sum of the mutation rates).
odd-ball outliers of either clade.


That's the text book expression. M = .01, G = 1000 for 30,000 year TMRCA.
It is based on the erroroneous assumption that variance of variance for a
marker is 2 m(i) G

But even with that erroneous formula (good for very young ages)
dG then is 2 * 1000 / (2*1000*/100)^1/2 = 448 generations = 13,400 years
I am assuming your sum of marker mutation rates is 1/100

Since then I have published many times that the formula should be:

2*G x 1/ sqrt(Sum i 2Gm(i)/[1+4m(i)G])
because the variance of variance for each marker is 2m(i)G[1+4m(i)G]
and for large G grows quadradically in G, not linearly.

I derived this analytically.
Jim Cullen and I exhaustively confirmed this with PC simulations.
Dienekes independently confirmed this.

You will note that it really enlarges statistical confidence interval for
large G and large m(i)

The proper downweighting factor of 1/[1+4m(i)G] for each marker in the
proper combination used for the overall estimator is incorporated into the
interclade part of my Generations2 program. It was never put into the
intraclade part of the program because intraclade ages are always quite
young for which the correction is not so important.

Ken





This thread: