Archiver > GENEALOGY-DNA > 2008-02 > 1203123664

From: "Tim Janzen" <>
Subject: Re: [DNA] Extinction Chances
Date: Fri, 15 Feb 2008 17:01:04 -0800
In-Reply-To: <02a101c87029$cbe85470$6400a8c0@Ken1>

Dear Ken,
I see what you mean about the need to be careful about overly
relying on the most outlying haplotypes within a subclade, but on the other
hand you need to be careful that you don't overly rely on haplotypes that
have a relatively small genetic distance from each other as well. For
instance, let's say that you have 10 haplotypes that are part of a known
subclade such as AS7E. Let's say that two haplotypes have a genetic
distance of 10 from the AS7E modal out of 67 markers. Let's say that 8
haplotypes that have an average genetic distance of only 4 from the AS7E
modal. In a situation such as this one, it could be that either through
sampling bias or through a particular progenitor leaving far more
descendents than his contemporaries (such as Genghis Khan) that the 8
haplotypes come from people who are closely related and are overly
represented relative to the other haplotypes. Thus, if you include multiple
haplotypes in the variance analysis, you need to somehow take into account
the possibility of sampling bias and overrepresentation of any particular
Even if all of this is relatively complicated and imprecise, it
would be nice if this list could collectively begin developing approximate
ages for all of sub-haplogroups in the ISOGG Y SNP tree at, for all of your Haplogroup I clades at, and possibly also for
John McEwan's R1b STR modals at and for
your R1b STR modals at The
approximate ages could be revised as more data becomes available, but I
think that having approximate ages for most of these sub-haplogroups and
clades would be helpful for many of us.

-----Original Message-----
[mailto:] On Behalf Of Ken Nordtvedt
Sent: Friday, February 15, 2008 3:24 PM
Subject: Re: [DNA] Extinction Chances

Tim, I am sure a statistician could come up with an estimator using the
greatest GD seen among the haplotype pairs in a sampled population of N
haplotypes of a clade in order to infer the age back to the clade MRCA. But

it would be a mess, and I challenge anyone tuning in to find that estimator
published in any book. It is just not practical or as good as taking the
variance statistic for the entire population of N haplotypes.

The relationship between generations to MRCA and largest seen GD would have
to involve N as well; it would be a complicated function.

Look at it another way: by putting all your eggs in the single basket of
that greatest GD seen, you are greatly enlarging your confidence interval
for whatever answer you might get. The relationship between variance of all

N haplotypes to generations is not only relatively simple, but by being an
average statistic taken over all the evidence available, it will have a
tighter confidence interval.


This thread: