TMG-L Archives
Archiver > TMG > 2005-08 > 1123207852
From: "Kevin Sholder" <>
Subject: RE: [TMG] Statistical Report - STD DEV in a nutshell
Date: Thu, 4 Aug 2005 22:10:52 -0400
In-Reply-To: <000d01c597c4$3c03e6c0$fe54fea9@oemcomputer>
John,
Yes, this was a very interesting read, thanks for mumbo-jumbo, it did make
sense and helped to better explain what I can use this for.
Thank you for your time,
Kevin L. Sholder
-----Original Message-----
From: John Davis [mailto:]
Sent: Tuesday, August 02, 2005 8:42 PM
To:
Subject: Re: [TMG] Statistical Report - STD DEV in a nutshell
For anyone who may be interested,
The statistic that will be least understood in the statistical report is
standard deviation. It is more or less meaningful and useful depending on
the context in which it is found. Standard deviation in relation to the
number of tags of a certain type is probably not as transparently meaningful
as in the context of the "age" statistics, age at first marriage, age at
death, etc.
Standard deviation can be called the "standard deviation of the mean (mean =
the grand average)." If you just remember (or write down) three numbers,
68.26%, 95.44% and 99.74% then you can put standard deviation to work for
you to glean some meaningful information from the statistics. (Just round
them to 68, 95 and 99.7 for all practical
purposes.)
68% of the data being examined will be found to be within plus or minus one
standard deviation of the mean, 95 percent will be within plus/minus two
standard deviations of the mean, and a full 99.7% will be found with
plus/minus three standard deviations of the mean. The accuracy of these
statements increases and becomes more meaningful as the size of the
population increases. In other words, the larger the database, the more the
above numbers will be found to be true.
If the data are skewed (heaped up on the low end, a few numbers trailing to
the high end = positively skewed), truncated (cut off in one direction, such
as no one being able to die before they were even conceived, so no data
exists for deaths before age zero) too evenly spread out or bunched in the
"middle" (kurtosis), bunched up around two (or more) "centers" (bimodal or
multimodal) or otherwise just not "standard" the numbers will not be as
meaningful, but will still hold some meaning.
Example:
Age at death
POP = 126 (very small population for calculating
meaningful STD DEV)
AVG (mean) = 65
STD DEV = 24.2
68.26% of this population died within 24.2 years, either way, of age 65
95.44% of this population died within 48.4 years, either way, of age 65
99.74% of this population died within 72.6 years, either way, of age 65
As you can see, the most meaningful data from this small population is the
68% or so that died within 24.2 years of age 65 ( between age 40.8 and age
89.2). This is something you can get your teeth into.
95.44% died between age 16.6 and 113.4 years of age. Since the oldest age at
death in my database (MAX) is 99 (rounded out), then we know that we're
talking about "between 16.6 and MAX"
99.74% died between age minus 7.6 and age 137.6. Since MIN is zero and MAX
is 99, then we know that this accounts for ALL of the remaining deaths, or
the remaining 4.3%.
Like I say, the numbers become more meaningful as the size of the population
under consideration increases. When it becomes large enough, the
"nonsensical" results begin to diminish and disappear. Then the 99.74%
*should* be contained within the actual data. Everything outside the 99.74%
(3 standard deviations) would be considered "outliers" and could be
considered with statistical certainty as being exceptions to the rule, or
mistakes.
A lot of this mumbo-jumbo will only be of interest to the geekiest among us,
or the most curious, but I thought I'd post it just in case it might be
helpful to at least a few. I assume that the TMG folks include the STD DEV
statistic for the same reason.
(And, for all the purists, I DO tend to mix singular/plural when using the
word, "data," whichever suits my whim :)
John Davis
A retired statistical process control/total quality management guy, dabbling
in genealogy, in Elgin, Oregon
----- Original Message -----
From: "Sholder, Kevin L" <>
To: <>
Sent: Monday, August 01, 2005 10:54 AM
Subject: [TMG] Statistical Report
> All,
>
> I've not used this before and created one today and have some
questions
> about what it means. The following columns are pretty obvious as what
> they mean:
>
> POP - population
> AVG - average
>
> These are not so obvious as to what they mean or how they are used:
>
> STD DEV - Standard Deviation??
> MIN - Minimum (minimum what?)
> MAX - Maximum (maximum what?)
> ID MIN - ????
> ID MAX - ????
>
> Here is an example line from this report:
>
> Age at first marriage
> POP - 14,222
> AVG - 24.3
> STD DEV - 32.4
> MIN - (-1707)
> MAX - 1733
> ID MIN - 22080
> ID MAX - 46047
>
> Can someone help explain what all these mean please?
>
> Thank you for your time,
> Kevin L. Sholder
>
>
>
> ==== TMG Mailing List ====
> To un-subscribe from TMG-L (in MAIL mode), send a message to
<> [to <> in Digest
mode] with just the word "unsubscribe" (no quotes)in the text and turn
off your signature.
>
>
==== TMG Mailing List ====
Send all messages and replies to <>.
This thread:
| RE: [TMG] Statistical Report - STD DEV in a nutshell by "Kevin Sholder" <> |