GENEALOGY-DNA-L Archives
Archiver > GENEALOGY-DNA > 2010-11 > 1291129831
From: James Heald <>
Subject: Re: [DNA] P value (was chances are, it's wrong)
Date: Tue, 30 Nov 2010 15:10:31 +0000
References: <F9C440A2-FC59-4A9E-AAAC-85DEE9D2FAB0@GMAIL.COM>, ,<COL115-W50D879F102DC3996D9D454A03A0@phx.gbl>, ,<4CE7A3C0.7050702@ucl.ac.uk>, ,<COL115-W1464B78AF0292D6AEFA183A03B0@phx.gbl>,<COL115-W5950BB2C58A31B4806036EA03B0@phx.gbl>,<4CE8088B.3020500@ucl.ac.uk>,<COL115-W45724B549DCDA5DD2EC4A0A03B0@phx.gbl> <COL115-W424C7732D1583F8960685CA03B0@phx.gbl> <4CE851AF.2030304@ucl.ac.uk> <REME20101122153113@alum.mit.edu> <4CEC4709.9080606@ucl.ac.uk><REME20101129170610@alum.mit.edu> <4CF4485E.10703@ucl.ac.uk><REME20101130000730@alum.mit.edu>
In-Reply-To: <REME20101130000730@alum.mit.edu>
On 30/11/2010 05:13, John Chandler wrote:
> the
> statement "P (theta within interval | data) = 95%" is *precisely* the
> definition of the 95% CI.
Popular myth, and true in some special situations, but in general *false*.
The definition of the 95% CI is that if you repeat the experiment a
large number of times with the same value of the unknown parameter, then
a procedure for generating an interval estimate for the unknown
parameter generates a *confidence interval* if in 95% of those repeated
trials the interval generated contains the parameter value which was
unknown.
So, in the simplest case of a single parameter, and data which can be
summarised in a single statistic D, which is a sufficient statistic,
then you set the right hand end theta_max of your CI so that the
cumulative probability of obtaining some data d less than D is 2 1/2%;
i.e.
int from 0 to D of Prob (d | theta_max) = 0.025
or equivalently,
Prob (d<D | theta_max) = 0.025
That is in general *not* the same as the interval you get calculating
the conditional probability
Prob (theta | D)
using Bayes theorem, and then fixing theta_max so that
Prob (theta>theta_max | D) = 0.025
i.e.
int from theta_max to infinity of Prob (theta | D) = 0.025
There *are* situations where the two approaches coincide, for example if
theta is a location parameter, and the Bayesian prior P(theta|I) is a
flat uniform distribution; or if theta is a scale parameter, and the
Bayesian prior P(theta|I) is a Jefferies distribution, P ~ 1/theta.
These are of course very important, and often modelled situations.
But in general, if the dependence Prob (d | theta) is not so simple (as
it is not so simple in the estimates of TMRCA), then the two sorts of
interval do *not* in general coincide.
This thread: