GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-11 > 1290974157


From: Kathy Johnston <>
Subject: Re: [DNA] Provenance of a DNA segment (importance of phasedhaplotypes)
Date: Sun, 28 Nov 2010 14:55:57 -0500


According to Ann:
> Family Finder tolerates occasional mismatches,
> which could be due to genotyping error or microdeletions.


Do you think that Family Finder is tolerating too many mismatches? Are genotyping errors and microdeletions major sources of false positives? If these were not tolerated, then would there be way too many false negatives?


According to Bruce Walsh, the odds of a run of 5 cM being shared between two individuals (when using 500,000 markers) is 1 in 10 million, but FF does not use 5 cm as the threshold. It looks like 7.7 cM is the initial threshold FTDNA is using. When they report on a match of 5 cm, are you saying that they are ignoring important mismatches? It seems to me that the genotyping errors and microdeletions may be a significant source of kinship prediction errors. Should the technology be improved in order for it to really be ready for prime time? Are our own natural deletions getting in the way of accuracy?


> Those small segments may be even smaller and less
> significant than we realize.


I believe I heard Bruce Walsh say that if you have a lot of small segments, e.g. 50 of these, then that is a signal. When he talks about signals, I think he means there is significance. He also stated that the only "good signal" was a long block.


Bennett Greenspan stated that FTDNA really only predicts 3 generations with confidence. For most of us, we already know 3 generations with confidence, so why do the test? I think that both FF and RF have come up with matches that can be confirmed quite well at the 4th cousin and below level.


> Phased haplotypes are not feasible for many people,
> at least at this stage of the game where father/mother/child
> trios are the easiest approach. Genotype data is a decent
> fall-back, but it can be quite noisy.


Phasing by using lots of siblings is better than just having the mother-father-child trios because you are likely to find more mismatches between siblings. Therefore, you get a more accurate sequence with fewer unknowns. But I don't think people necessarily want to use the phased data to find their relatives. They could use the phased data instead for much more invasive purposes. They might use the data to decode their distant relatives (along with the common ancestor) after they were found. As you pointed out, the genotype data is best at detecting provenance because there are fewer false positives especially when you set the thresholds high enough. Phasing is easier in selected smaller segments. My fear is that people will use phased data to decode their distant cousin's genes (very small segments within the larger matching area) without that cousin knowing it AFTER they have already matched that cousin's larger segment and pedigree through conventional means. Granted that you are only decoding one half of the genetic information, a haplotype, not a genotype, but it could hold important dominant genetic trait information such as the colon cancer trait traced back to the Fry family in 1630. This is one of those areas in which we need better informed consent through education about risks. But I am all for education rather than regulation at this point in time because currently the benefits outweigh the risks.


Kathy



This thread: