GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-11 > 1290298459


From: James Heald <>
Subject: Re: [DNA] FW: Odds Are, It's Wrong - 5% of the time
Date: Sun, 21 Nov 2010 00:14:21 +0000
References: <F9C440A2-FC59-4A9E-AAAC-85DEE9D2FAB0@GMAIL.COM> <COL115-W50D879F102DC3996D9D454A03A0@phx.gbl> <4CE7A3C0.7050702@ucl.ac.uk> <COL115-W1464B78AF0292D6AEFA183A03B0@phx.gbl> <COL115-W5950BB2C58A31B4806036EA03B0@phx.gbl> <4CE8088B.3020500@ucl.ac.uk><AANLkTi=mPBovTQGxXnLcYN+jG5-JNBK2ZNzqR3AHUHoo@mail.gmail.com>
In-Reply-To: <AANLkTi=mPBovTQGxXnLcYN+jG5-JNBK2ZNzqR3AHUHoo@mail.gmail.com>


On 20/11/2010 21:24, Robert Stafford wrote:
> I am not sure how this applies to TMRCA calculations. What is the null
> hypothesis you are testing? What do the Bayesians use as the confidence
> interval for TMRCA?
>
> Bob Stafford

Hi Bob,

I've talked a bit more about this in the post I've just made in the new
"P value" thread.

Basically, if the Frequentist upper confidence limit is 50 generations,
the null hypothesis is that the true number of generations is more than
50. For each of those values, in fewer than 5% of cases such a number
of generations would give n mutations or fewer; so the promise made is
that *if* the number of generations falls outside the confidence range,
*then* the confidence range would be wrongly accepted less than 5% of
the time.


However the counterpoint to this, as I pointed out in the other thread,
is that there are a lot more numbers more than 50 than there are less
than 50, so *given* that n mutations have been observed, it may actually
be more likely to be a rare event from a (comparatively common) large
number of generations, than a common event from a (comparatively rare)
small number of generations.

Hence the Bayesian alternative approach, which is to actually (try to)
work out the probability distribution for the number of generations
given the number of mutations, using eg Bruce Walsh's TMRCA calculator
http://nitro.biosci.arizona.edu/ftdna/TMRCA.html
for pairs of haplotypes, or software like BATWING to estimate the
probability distribution for the coalescence time given multiple haplotypes.

Then, having estimated the probability distribution, rather a confidence
interval, Bayesians will instead quote a "credibility interval" -- an
interval that contains 95% of the probability distribution just calculated.

Unlike confidence intervals therefore, the Bayesians quote an interval
in which, as a result of their calculations and modelling, they estimate
there to be a 95% chance of containing the true value.

In the above, I've deliberately glossed over the idea of one-sided and
two-sided intervals, so whether we should really be talking about 5% or
2.5% being the probability at issue in the tails; but if we were to
pin one end of the Bayesian interval at zero, then the Bayesian
credibility interval would be from there up until the point where 95% of
the probability has been accounted for, with only 5% chance of being
beyond that point.

Arguably, this is usually the notion people are actually looking for,
when they ask for the range of possibilites.


This thread: