**GENEALOGY-DNA-L Archives**

From:James Heald <>Subject:Re: [DNA] FW: Odds Are, It's Wrong - 5% of the timeDate:Sun, 21 Nov 2010 02:51:42 +0000References:<F9C440A2-FC59-4A9E-AAAC-85DEE9D2FAB0@GMAIL.COM> <COL115-W50D879F102DC3996D9D454A03A0@phx.gbl> <4CE7A3C0.7050702@ucl.ac.uk> <COL115-W1464B78AF0292D6AEFA183A03B0@phx.gbl> <COL115-W5950BB2C58A31B4806036EA03B0@phx.gbl> <4CE8088B.3020500@ucl.ac.uk> <AANLkTi=mPBovTQGxXnLcYN+jG5-JNBK2ZNzqR3AHUHoo@mail.gmail.com><4CE8645D.7030203@ucl.ac.uk>In-Reply-To:<4CE8645D.7030203@ucl.ac.uk>On 21/11/2010 00:14, James Heald wrote:

> On 20/11/2010 21:24, Robert Stafford wrote:

>> I am not sure how this applies to TMRCA calculations. What is the null

>> hypothesis you are testing? What do the Bayesians use as the confidence

>> interval for TMRCA?

>>

>> Bob Stafford

>

Sorry, that previous explanation of the relationship between confidence

intervals and hypothesis testing wasn't quite right.

Let me try again.

Suppose you have observed n mutations. You say that given that number

of mutations, your confidence interval for the number of generations

runs from T1 to T.

T, the high end of the confidence interval, is fixed so that *if* T were

the true number of generations, then you would expect to see fewer than

n mutations 5% of the time.

So, 95% of the time, if T were the true value, you would expect to see

at least n mutations, which would lead you to assign a confidence limit

that would include T.

Similarly, whatever the true value, before looking at the data, we

expect it to lead to a number of mutations that will lead to a

confidence interval that will include the true value.

The null hypothesis therefore is that the confidence interval will

include the true value; a-priori we expect it to be falsely rejected

only 5% of the time.

* * *

So far, so plausible. So when might the Bayesian approach be able to

improve on this?

Suppose we have a reasonably well-calibrated prior probability for the

number of generations. If this includes a long tail, we might expect

most often to find a fairly high number of mutations.

But suppose instead in our particular experiment we find a low number of

mutations. This, according to our model, is a rare event. So we are

then faced with the choice: Is this an unusual event from a typical

(large) number of generations? Or is this a typical event from an

unusual (small) number of generations?

The situation is like the dog-barking case. Most of the time the dog

does not bark, and that is when the confidence approach produces the

right answer. But in the unusual event that the dog does bark, then

that is not part of the main run of cases that the confidence approach

is validated on, so in that case all bets are off as to whether the

confidence approach will or won't be likely to give the right answer.

Similarly here, if there is an unusually low number of mutations, that

is not part of the main run of cases that the confidence approach is

validated on; so again in that case the bets are off as to whether the

confidence approach will give the right answer.

* * *

The caveat to this is that the Bayesian approach depends quite strongly

on the prior probability being well calibrated to the sort of cases that

it is presented with.

For example, Bruce Walsh's model assumes a particular prior

distribution. But does that prior accurately reflect the cases being

presented to it?

One way to test this is to generate simulated genetic distances, based

on randomly picking ages according to that prior distribution. If the

set of genetic distances created looks similar to the set of genetic

distances we have been running the algorithm on, then all may be well.

But if the set of genetic distances looks very different to the typical

genetic distances we have been interested in, then something may be amiss.

One key parameter in Bruce Walsh's model is the effective population

size from which the two haplotypes being run against each other are

considered to have been randomly sampled. If that is mis-estimated,

that may strongly alter the conclusions. Similarly, population growth

rate is an important factor.

The bottom line is that the Bayesian approach can sometimes gives much

better answers that the confidence approach, because of its potential

for much better performance when presented with 'unusual' situations.

But this can only be true if the model underlying the Bayesian analysis

is correctly calibrated, so that what it thinks is unusual really in

reality *is* unusual.

> Hi Bob,

>

> I've talked a bit more about this in the post I've just made in the new

> "P value" thread.

>

> Basically, if the Frequentist upper confidence limit is 50 generations,

> the null hypothesis is that the true number of generations is more than

> 50. For each of those values, in fewer than 5% of cases such a number

> of generations would give n mutations or fewer; so the promise made is

> that *if* the number of generations falls outside the confidence range,

> *then* the confidence range would be wrongly accepted less than 5% of

> the time.

>

>

> However the counterpoint to this, as I pointed out in the other thread,

> is that there are a lot more numbers more than 50 than there are less

> than 50, so *given* that n mutations have been observed, it may actually

> be more likely to be a rare event from a (comparatively common) large

> number of generations, than a common event from a (comparatively rare)

> small number of generations.

>

> Hence the Bayesian alternative approach, which is to actually (try to)

> work out the probability distribution for the number of generations

> given the number of mutations, using eg Bruce Walsh's TMRCA calculator

> http://nitro.biosci.arizona.edu/ftdna/TMRCA.html

> for pairs of haplotypes, or software like BATWING to estimate the

> probability distribution for the coalescence time given multiple haplotypes.

>

> Then, having estimated the probability distribution, rather a confidence

> interval, Bayesians will instead quote a "credibility interval" -- an

> interval that contains 95% of the probability distribution just calculated.

>

> Unlike confidence intervals therefore, the Bayesians quote an interval

> in which, as a result of their calculations and modelling, they estimate

> there to be a 95% chance of containing the true value.

>

> In the above, I've deliberately glossed over the idea of one-sided and

> two-sided intervals, so whether we should really be talking about 5% or

> 2.5% being the probability at issue in the tails; but if we were to

> pin one end of the Bayesian interval at zero, then the Bayesian

> credibility interval would be from there up until the point where 95% of

> the probability has been accounted for, with only 5% chance of being

> beyond that point.

>

> Arguably, this is usually the notion people are actually looking for,

> when they ask for the range of possibilites.

>

>

> -------------------------------

> To unsubscribe from the list, please send an email to with the word 'unsubscribe' without the quotes in the subject and the body of the message

>

>

**This thread:**

- [DNA] Odds Are, It's Wrong by Wilcox Lisa <>
- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>
- Re: [DNA] Odds Are, It's Wrong by "Ken Nordtvedt" <>
- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>

- Re: [DNA] Odds Are, It's Wrong by "Alister John Marsh" <>

- Re: [DNA] Odds Are, It's Wrong by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Steven Bird <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Steven Bird <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Gareth Henson <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Wilcox Lisa <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Wilcox Lisa <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by James Heald <>

- [DNA] Odds Are, It's Wrong - 5% of the time by "Lancaster-Boon" <>

- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>