**GENEALOGY-DNA-L Archives**

From:James Heald <>Subject:Re: [DNA] P value (was chances are, it's wrong)Date:Sat, 20 Nov 2010 22:54:39 +0000References:<F9C440A2-FC59-4A9E-AAAC-85DEE9D2FAB0@GMAIL.COM>, ,<COL115-W50D879F102DC3996D9D454A03A0@phx.gbl>, ,<4CE7A3C0.7050702@ucl.ac.uk>, ,<COL115-W1464B78AF0292D6AEFA183A03B0@phx.gbl>,<COL115-W5950BB2C58A31B4806036EA03B0@phx.gbl>,<4CE8088B.3020500@ucl.ac.uk>,<COL115-W45724B549DCDA5DD2EC4A0A03B0@phx.gbl><COL115-W424C7732D1583F8960685CA03B0@phx.gbl>In-Reply-To:<COL115-W424C7732D1583F8960685CA03B0@phx.gbl>On 20/11/2010 18:37, Steven Bird wrote:

>

> James wrote:

>

>> P-value has a very precise meaning in Frequentist statistics.

>> It is "the probability of obtaining a test statistic at least as extreme

>> as the one that was actually observed, assuming that the null hypothesis

>> is true".

>> http://en.wikipedia.org/wiki/P-value

>

> I reply:

>

> It is also defined as the probability of committing a Type I error (rejecting the null when it is in fact true or a false positive) when using a statistic such as student's T test. When p=0.05, it means that the statistician have a 1 in 20 chance of being wrong (falsely rejecting the null) when the null is in fact true. To me, that is identical in meaning with the statement that he or she also has a 19 out of 20 chance of being right. How is it different?

Okay.

One of the important things when dealing with probabilities is always to

be aware what the probabilities in question are conditioned on.

The P-value gives the probability, *given* that the null hypothesis is

true, and without taking into account the specific data that has come

in, that the null hypothesis will be falsely rejected.

So for instance in the dog barking example, *if* the dog is not hungry

*then* 95% of the time it will not bark, so 95% of the time we will not

conclude the dog is hungry if it isn't.

It is worth emphasising that this is all predicated on what we can say

*before* we know whether the dog has barked or not.

It does *not* give any guarantees as to what proportion of times we will

be make a Type I error out of those cases where the dog has barked.

There is no reason, when we look at the proportion of Type I errors in

those particular cases, for it to be limited to 5%. In fact, in the

scenario I gave earlier, we can imagine getting 100% Type I errors,

whenever the dog barks.

This is the shortcoming of the P-value approach, that no attempt is

being made to try to calculate the probability of the dog actually being

hungry, given the data; so there is no reason to expect the test, in

cases of those particular circumstances, to be right 95% or any other

particular percentage of the time.

* * *

Turning to TMRCAs, the Bayesian distributions are typically very

long-tailed, for which the P-value/confidence approach tends to produce

values which under-report the full Bayesian range.

Suppose the upper confidence limit is 50 generations. That means that

if the TMRCA actually was 50 generations, it would produce n or fewer

mutations 5% of the time.

But it tells us nothing about how often if the TMRCA was actually 60

generations, or 70 generations, how often that would produce n or fewer

mutations (other than below 5% of the time) -- it tells us nothing about

how quickly this percentage falls off as the number of generations

increases.

For a particular large number N generations, it might be quite rare that

we see only n mutations. But on the other hand, there are an awful lot

more numbers greater than 50 than there are less than 50. This tends to

mean that, when you calculate the weight of probability, using for

example Bruce Walsh's TMRCA calculator,

http://nitro.biosci.arizona.edu/ftdna/TMRCA.html

*given* that n mutations have been observed, rather more than 5% of the

probability weight will be located beyond 50 generations, even though it

is 50 generations that is the frequentist 0.95 confidence limit.

**This thread:**

- [DNA] Odds Are, It's Wrong by Wilcox Lisa <>
- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>
- Re: [DNA] Odds Are, It's Wrong by "Ken Nordtvedt" <>
- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>

- Re: [DNA] Odds Are, It's Wrong by "Alister John Marsh" <>

- Re: [DNA] Odds Are, It's Wrong by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Steven Bird <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Steven Bird <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Gareth Henson <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Wilcox Lisa <>
- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by Wilcox Lisa <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by "Ken Nordtvedt" <>

- Re: [DNA] Odds Are, It's Wrong - 5% of the time by James Heald <>

- [DNA] Odds Are, It's Wrong - 5% of the time by "Lancaster-Boon" <>

- Re: [DNA] Odds Are, It's Wrong by Al Aburto <>