Archiver > GENEALOGY-DNA > 2010-11 > 1290293679

From: James Heald <>
Subject: Re: [DNA] P value (was chances are, it's wrong)
Date: Sat, 20 Nov 2010 22:54:39 +0000
References: <F9C440A2-FC59-4A9E-AAAC-85DEE9D2FAB0@GMAIL.COM>, ,<COL115-W50D879F102DC3996D9D454A03A0@phx.gbl>, ,<>, ,<COL115-W1464B78AF0292D6AEFA183A03B0@phx.gbl>,<COL115-W5950BB2C58A31B4806036EA03B0@phx.gbl>,<>,<COL115-W45724B549DCDA5DD2EC4A0A03B0@phx.gbl><COL115-W424C7732D1583F8960685CA03B0@phx.gbl>
In-Reply-To: <COL115-W424C7732D1583F8960685CA03B0@phx.gbl>

On 20/11/2010 18:37, Steven Bird wrote:
> James wrote:
>> P-value has a very precise meaning in Frequentist statistics.
>> It is "the probability of obtaining a test statistic at least as extreme
>> as the one that was actually observed, assuming that the null hypothesis
>> is true".
> I reply:
> It is also defined as the probability of committing a Type I error (rejecting the null when it is in fact true or a false positive) when using a statistic such as student's T test. When p=0.05, it means that the statistician have a 1 in 20 chance of being wrong (falsely rejecting the null) when the null is in fact true. To me, that is identical in meaning with the statement that he or she also has a 19 out of 20 chance of being right. How is it different?


One of the important things when dealing with probabilities is always to
be aware what the probabilities in question are conditioned on.

The P-value gives the probability, *given* that the null hypothesis is
true, and without taking into account the specific data that has come
in, that the null hypothesis will be falsely rejected.

So for instance in the dog barking example, *if* the dog is not hungry
*then* 95% of the time it will not bark, so 95% of the time we will not
conclude the dog is hungry if it isn't.

It is worth emphasising that this is all predicated on what we can say
*before* we know whether the dog has barked or not.

It does *not* give any guarantees as to what proportion of times we will
be make a Type I error out of those cases where the dog has barked.
There is no reason, when we look at the proportion of Type I errors in
those particular cases, for it to be limited to 5%. In fact, in the
scenario I gave earlier, we can imagine getting 100% Type I errors,
whenever the dog barks.

This is the shortcoming of the P-value approach, that no attempt is
being made to try to calculate the probability of the dog actually being
hungry, given the data; so there is no reason to expect the test, in
cases of those particular circumstances, to be right 95% or any other
particular percentage of the time.

* * *

Turning to TMRCAs, the Bayesian distributions are typically very
long-tailed, for which the P-value/confidence approach tends to produce
values which under-report the full Bayesian range.

Suppose the upper confidence limit is 50 generations. That means that
if the TMRCA actually was 50 generations, it would produce n or fewer
mutations 5% of the time.

But it tells us nothing about how often if the TMRCA was actually 60
generations, or 70 generations, how often that would produce n or fewer
mutations (other than below 5% of the time) -- it tells us nothing about
how quickly this percentage falls off as the number of generations

For a particular large number N generations, it might be quite rare that
we see only n mutations. But on the other hand, there are an awful lot
more numbers greater than 50 than there are less than 50. This tends to
mean that, when you calculate the weight of probability, using for
example Bruce Walsh's TMRCA calculator,
*given* that n mutations have been observed, rather more than 5% of the
probability weight will be located beyond 50 generations, even though it
is 50 generations that is the frequentist 0.95 confidence limit.

This thread: