**GENEALOGY-DNA-L Archives**

From:James Heald <>Subject:Re: [DNA] P value (was chances are, it's wrong)Date:Tue, 30 Nov 2010 15:10:31 +0000References:<F9C440A2-FC59-4A9E-AAAC-85DEE9D2FAB0@GMAIL.COM>, ,<COL115-W50D879F102DC3996D9D454A03A0@phx.gbl>, ,<4CE7A3C0.7050702@ucl.ac.uk>, ,<COL115-W1464B78AF0292D6AEFA183A03B0@phx.gbl>,<COL115-W5950BB2C58A31B4806036EA03B0@phx.gbl>,<4CE8088B.3020500@ucl.ac.uk>,<COL115-W45724B549DCDA5DD2EC4A0A03B0@phx.gbl> <COL115-W424C7732D1583F8960685CA03B0@phx.gbl> <4CE851AF.2030304@ucl.ac.uk> <REME20101122153113@alum.mit.edu> <4CEC4709.9080606@ucl.ac.uk><REME20101129170610@alum.mit.edu> <4CF4485E.10703@ucl.ac.uk><REME20101130000730@alum.mit.edu>In-Reply-To:<REME20101130000730@alum.mit.edu>On 30/11/2010 05:13, John Chandler wrote:

> the

> statement "P (theta within interval | data) = 95%" is *precisely* the

> definition of the 95% CI.

Popular myth, and true in some special situations, but in general *false*.

The definition of the 95% CI is that if you repeat the experiment a

large number of times with the same value of the unknown parameter, then

a procedure for generating an interval estimate for the unknown

parameter generates a *confidence interval* if in 95% of those repeated

trials the interval generated contains the parameter value which was

unknown.

So, in the simplest case of a single parameter, and data which can be

summarised in a single statistic D, which is a sufficient statistic,

then you set the right hand end theta_max of your CI so that the

cumulative probability of obtaining some data d less than D is 2 1/2%;

i.e.

int from 0 to D of Prob (d | theta_max) = 0.025

or equivalently,

Prob (d<D | theta_max) = 0.025

That is in general *not* the same as the interval you get calculating

the conditional probability

Prob (theta | D)

using Bayes theorem, and then fixing theta_max so that

Prob (theta>theta_max | D) = 0.025

i.e.

int from theta_max to infinity of Prob (theta | D) = 0.025

There *are* situations where the two approaches coincide, for example if

theta is a location parameter, and the Bayesian prior P(theta|I) is a

flat uniform distribution; or if theta is a scale parameter, and the

Bayesian prior P(theta|I) is a Jefferies distribution, P ~ 1/theta.

These are of course very important, and often modelled situations.

But in general, if the dependence Prob (d | theta) is not so simple (as

it is not so simple in the estimates of TMRCA), then the two sorts of

interval do *not* in general coincide.

**This thread:**

- [DNA] Odds Are, It's Wrong by Wilcox Lisa <>
- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>
- Re: [DNA] Odds Are, It's Wrong by "Ken Nordtvedt" <>
- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>

- Re: [DNA] Odds Are, It's Wrong by "Alister John Marsh" <>

- Re: [DNA] Odds Are, It's Wrong by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Steven Bird <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Steven Bird <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Gareth Henson <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Wilcox Lisa <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Wilcox Lisa <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by James Heald <>

- [DNA] Odds Are, It's Wrong - 5% of the time by "Lancaster-Boon" <>

- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>