Archiver > GENEALOGY-DNA > 2008-02 > 1203129897

Subject: Re: [DNA] Extinction Chances
Date: Fri, 15 Feb 2008 21:44:57 EST

The Childress study's Administrator, working with FTDNA, seems to have
concluded that NPE's might account for a couple of families in the group. I
haven't researched their ancestry and have no opinion on this, but certain
information provided on the website suggests that possibility--or maybe I should
say, at least doesn't contradict it. The problem is that the analysis also
implies ignorance of the existence of AS7E and of other families in the group
not included in the Childress study.

At the other extreme, I think we may be confident in identifying families
that aren't connected to Childress by NPE's, at least not in a very long
time--Trimble, for example, whose ancestors in the 19th century were in Australia
(if I'm remembering correctly).

So it might be possible to begin grouping some of the lineages in this way.

But what about measuring distance? Do we use GD? Or simply count markers
that don't match, and what about markers like 464 vs those that are more
stable? And finally, how many markers? If 67 are required, then we would lose
over half of the 19 families in the group.



Dear Ken,
I see what you mean about the need to be careful about overly
relying on the most outlying haplotypes within a subclade, but on the other
hand you need to be careful that you don't overly rely on haplotypes that
have a relatively small genetic distance from each other as well. For
instance, let's say that you have 10 haplotypes that are part of a known
subclade such as AS7E. Let's say that two haplotypes have a genetic
distance of 10 from the AS7E modal out of 67 markers. Let's say that 8
haplotypes that have an average genetic distance of only 4 from the AS7E
modal. In a situation such as this one, it could be that either through
sampling bias or through a particular progenitor leaving far more
descendents than his contemporaries (such as Genghis Khan) that the 8
haplotypes come from people who are closely related and are overly
represented relative to the other haplotypes. Thus, if you include multiple
haplotypes in the variance analysis, you need to somehow take into account
the possibility of sampling bias and overrepresentation of any particular
Even if all of this is relatively complicated and imprecise, it
would be nice if this list could collectively begin developing approximate
ages for all of sub-haplogroups in the ISOGG Y SNP tree at
_ (
, for all of your Haplogroup I clades at
( , and possibly also for
John McEwan's R1b STR modals at _
( and for
your R1b STR modals at _
( . The
approximate ages could be revised as more data becomes available, but I
think that having approximate ages for most of these sub-haplogroups and
clades would be helpful for many of us.

Tim, I am sure a statistician could come up with an estimator using the
greatest GD seen among the haplotype pairs in a sampled population of N
haplotypes of a clade in order to infer the age back to the clade MRCA. But

it would be a mess, and I challenge anyone tuning in to find that estimator
published in any book. It is just not practical or as good as taking the
variance statistic for the entire population of N haplotypes.

The relationship between generations to MRCA and largest seen GD would have
to involve N as well; it would be a complicated function.

Look at it another way: by putting all your eggs in the single basket of
that greatest GD seen, you are greatly enlarging your confidence interval
for whatever answer you might get. The relationship between variance of all

N haplotypes to generations is not only relatively simple, but by being an
average statistic taken over all the evidence available, it will have a
tighter confidence interval.


**************The year's hottest artists on the red carpet at the Grammy
Awards. Go to AOL Music.

This thread: