From: "Tim Janzen" <>
Subject: Re: [DNA] R-U152 and R-L21 on the European Continent
Date: Sat, 12 Dec 2009 19:45:24 -0800

Dear John,
Thanks for the background information. The problem is that you must
throw away the fast mutating markers because their variance is saturated if
you are going to attempt to calculate interclade TMRCA estimates that are
anywhere close to the true TMRCAs for subclades or haplogroups that are
20,000 years old or older. There is no way around it. As I pointed out in
my message in July at
75 and in my earlier messages today, TMRCA estimates based on haplotype
datasets that include the fastest mutating markers in the 67-marker FTDNA
panel clearly skew the TMRCA estimates to be lower than they rightly should
be. I would rather accept a wider confidence interval and know that there
is a reasonable probability that my interclade TMRCA estimates are accurate
than to include a lot of fast mutating markers that are clearly skewing the
TMRCA estimates to be lower and thus clearly creating TMRCA estimates that
must certainly be incorrect. Confidence intervals are only helpful if you
have correct assumptions and processes for the calculations.
Below I am including the 95% confidence intervals for the interclade
TMRCA estimate for the node of haplogroups A and B using your estimated
mutation rates:

10 slow markers: 147279 years (95% confidence interval is 77533 years)

10 slow medium markers: 78714 years (95% confidence interval is 32217

10 medium markers: 8396 years (95% confidence interval is 6175 years)

10 medium fast markers: 21794 years (95% confidence interval is 6750 years)

10 fast markers: 13847 years (95% confidence interval is 3479 years)

50 markers: 35245 years (95% confidence interval is 3070 years)

10 YHRD markers using YHRD mutation rates: 8811 years
(95% confidence interval is 5657 years)

24 slow markers: 77576 years (95% confidence interval is 22008 years)

Note that all of the TMRCA estimates above that have relatively
tight confidence intervals are all clearly inaccurate. The only TMRCA
estimates that are likely to be close to the true TMRCA are the three with
the widest confidence intervals (the 10 slow markers, the 10 slow medium
markers and the 24 slowest markers).


It is important to quote uncertainties along with estimates. In the example
I gave above, the 1/10 estimate was 0.1 +/- 0.1 for the individual
experiments, encompassing the 1/6 mean-value estimate comfortably. The
combined experiment gave the same estimate, 1/10, but the uncertainty was
reduced to +/- 0.07, still comfortably encompassing the MVE 3/22.
As it happens, if you throw away all the
fast markers, the TMRCA uncertainty goes to pot, and that is why a
result based only on the slowest markers is not very reliable in any
case -- because it is too uncertain.

John Chandler

