GENEALOGY-DNA-L ArchivesArchiver > GENEALOGY-DNA > 2010-11 > 1288997361
From: argiedude <>
Subject: [DNA] Illumina (HapMap) versus Affymetrix (Xing, 2010);same samples tested twice
Date: Fri, 5 Nov 2010 19:49:21 -0300
I downloaded the publicly available genotype data from HapMap3 and from the Xing (2010) study earlier this year, and made some comparisons regarding the apparent fact that Xing retested some of the HapMap3 samples because they were originally tested with the Illumina chip, and Xing had tested all his other samples with the Affymetrix chip, which has very few SNPs in common with Illumina, so he must have decided to just retest the HapMap3 samples with the Affymetrix chip. I think this may shed some light on the accuracy of these full genome scans (big "may", of course).
Both chips had about 30K or 40K SNPs in common, and I used 13597 of them for this comparison. I found that, strangely, the rate of no-calls depended on the population of origin of the sample. A no-call, in this case, is caused if there's a no call in either SNP of the same sample in the Illumina scan or in the Affymetrix scan, it doesn't have to be in both. I don't know if the rate of no-calls was even in both scans or concentrated in one of the two. Kenay and Tuscany had a no-call rate of 1.5%, while CEU, Han, Japan, Yoruba had equally small no-call rates of around 0.3%. Why would the no-call rate depend on the population of origin? Could they have been stored separately from the other 4 populations and suffered a different degree of degradation?
The interesting thing was how did the remaining (none no-call) SNPs match up between either scan. For each SNP I eliminated the no-calls and counted how many were identical in either scan (Illumina and Affymetrix) of that same sample. CEU has a 99.97% rate of identical alleles in either scan of its none-no-call SNPs. Han had 99.95%. Japan had 99.95%. Yoruba had 99.96%. But strangely, Kenya dropped notably to 99.61% and so did Tuscany, to 99.58%. The same 2 populations that had a high rate of no calls. Remember, for this latter exercise, I excluded the no-calls, it was done only on their valid SNPs, so theoretically there shouldn't be a reason for a coincidence between the high rates of no-calls and the high rates of mismatched alleles amongst their none-no-call SNPs. But it could be (?) that the high no-call rate is indicating the samples are more damaged than the rest, so we should expect (?) that amongst their succesful calls there would also be a high error rate (badly called, instead of no-called).
Any thoughts? Is Illumina better than Affymetrix, or vice versa?
I've been doing genetic distance (FST) estimates on all these brand new samples, and it occured to me that if there was an error rate on the order of 0.4% amongst the called (none no-called) SNPs, this could cause a margin of error in the FST calculation that would actually be significant when comparing closely related populations. For example, the typical distances between European populations are around 0.0020 FST, so if a set of samples has an error rate of 0.4%, I suppose this would cause an increase in FST of about that same amount.
If anyone's interested, here's a link to the FST excel file.
[FST genetic distances (HGDP, HapMap3, Behar, Xing).xls]