GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-11 > 1290553097


From: James Heald <>
Subject: Re: [DNA] P value (was chances are, it's wrong)
Date: Tue, 23 Nov 2010 22:58:17 +0000
References: <F9C440A2-FC59-4A9E-AAAC-85DEE9D2FAB0@GMAIL.COM>, ,<COL115-W50D879F102DC3996D9D454A03A0@phx.gbl>, ,<4CE7A3C0.7050702@ucl.ac.uk>, ,<COL115-W1464B78AF0292D6AEFA183A03B0@phx.gbl>,<COL115-W5950BB2C58A31B4806036EA03B0@phx.gbl>,<4CE8088B.3020500@ucl.ac.uk>,<COL115-W45724B549DCDA5DD2EC4A0A03B0@phx.gbl> <COL115-W424C7732D1583F8960685CA03B0@phx.gbl> <4CE851AF.2030304@ucl.ac.uk><REME20101122153113@alum.mit.edu>
In-Reply-To: <REME20101122153113@alum.mit.edu>


On 22/11/2010 20:31, John Chandler wrote:
> James wrote:
>> Suppose the upper confidence limit is 50 generations. That means that
>> if the TMRCA actually was 50 generations, it would produce n or fewer
>> mutations 5% of the time.
>
> You are mistaking confidence in the outcome for confidence in the
> estimate. The 95% confidence interval for an estimated parameter
> is the range which includes 95% of the cumulative probability
> distribution for that parameter. This somewhat arbitrary interval
> is popular because it happens to coincide very nearly with the
> "plus or minus two standard deviations" range of a Gaussian. In
> terms of percentiles, the interval runs from 2.5 to 97.5, meaning
> that the likelihood of exceeding the upper limit is only 2.5%, not
> 5%, and the likelihood of falling below the lower limit is the other
> 2.5%.

It's important to keep clear that the distinction between (Frequentist)
confidence intervals on the one hand, and (Bayesian) credible intervals
on the other.

There is indeed a cumulative probability involved in the Frequesntist
estimate, but it is as I have described.

More specifically, if a 90% Frequentist CI runs from a to b, then b is
fixed by the condition that
P (x < x_obs | b) = 0.05
or, equivalently
Integral from -infinity to x_obs of P(x|b) = 0.05

The value of the lower limit a is fixed by the corresponding condition that
P (x < x_obs | a) = 0.95

In both cases, for a Frequentist CI the values of the ends a and b of
the confidence interval are fixed by cumulating the forward probability,
of possible data given the parameter; and not by cumulating the inverse
probability, of possible parameter values given the /actual/ data.

(Indeed, a sufficiently fundamentalist Frequentist would deny the latter
concept is even meaningful, as they would object to the parameter -- a
thing considered to have a fixed, albeit unknown, actual value -- being
treated as a random variable).

A Bayesian on the other hand would have no problem with that, and they
would proceed much as you have indicated, but they would then be
calculating a Bayesian credible interval, not a Frequentist confidence
interval.

In a number of simple cases the two intervals coincide -- for instance a
Bayesian credible interval for a location parameter with a
translationally invariant likelihood, using a flat uniform prior; or a
Bayesian credible interval for a scale parameter, using a Jeffrey's
1/theta prior, for a likelihood that scales directly with the scale
parameter, otherwise preserving its shape.
(See eg http://bayes.wustl.edu/etj/articles/confidence.pdf, p. 205 onwards).

The above covers many of the simplest cases -- for instance your example
of estimation based on Gaussian likelihood where only the mean is unknown.

But in more general and more complicated cases it turns out that the
Frequentist confidence interval and the Bayesian credible interval are
not in general equivalent, so care is then needed to keep straight which
one one is calculating, for they will not in general come to equivalent
answers.


This thread: