GENEALOGY-DNA-L Archives

Archiver > GENEALOGY-DNA > 2010-11 > 1289154347


From: Bonnie Schrack <>
Subject: [DNA] X-Y "recombination" (was Re: WTY Update slides)
Date: Sun, 07 Nov 2010 13:25:47 -0500


I just now realized that you all had been discussing this on the list,
sorry! Thanks for your very interesting contribution, Kathy.

Kathy wrote:
> Thomas Krahn really does need to publish his material. There are
> plenty of genealogists who could add information to this discovery.

It would be wonderful if Thomas would actually publish a paper on this!
I had somehow assumed he wouldn't, just because to my knowledge, he
hasn't yet published any papers, despite all his amazing discoveries.
I'm sure it has to do with the fact that he has tremendous
responsibilities and research projects that keep him so very busy at the
lab.

It's such a shame that his work is so unknown in academic circles. At
the ASHG meeting, I was a bit shocked to see how many of the scientists
were still unaware of ISOGG, or the outpouring of new L-SNPs developed
by Thomas that are filling our tree. They are the ones being left
behind, but they still don't realize it!

If he can be persuaded to publish this, I would be extremely happy. I
should have asked him before whether he's thinking about it. . .so I am
now. . . how about it, Thomas?

If Thomas doesn't feel he has the time or desire to publish something on
this, I would propose that a few of us get together and write something
for JOGG, so we can get the information out there, and scientists can
pick up the ball from there.We know that at least some of them read JOGG.

> When he first talked about the X to Y transfer, he said: "The C
> allele at rs12859783 is coming from a different X-haploblock than the
> G observed in the ChrX HUGO sequence." He was talking about the L88
> region.

Yes. Rs12859783, found on the Y at position 16105255, is now called
L198 as a Y-SNP. On the Y, it has only been found in our branch of
J2a4, which has recently received the ISOGG and FTDNA clade name of
J2a4i. There is a considerable amount of genetic distance between my
father and the Middle Eastern person who was also found to be derived at
this, and the rest of the L88-region SNPs.

> That was a clue to me that there could be a phased pattern that we
> should be looking for that is different from HUGO. Whit Athey taught
> us some phasing techniques at the FTDNA conference for those who don't
> know what I am talking about. Maybe now people will also understand
> what I was talking about several months ago.


I guess I'm one of those people who still isn't sure what you're talking
about. ;-) But I'll work my way through it. . .

I attended Whit's presentation, and (for the benefit of those that
couldn't be there) I understand that phasing is separating raw allele
data into the two specific chromosome sequences, i.e., if you have
results like this:
AT
AT
AA
AG
CT
CT
TT
AT
GG
AG

by phasing, you can separate it into, for instance:
A T
T A
A A
G A
C T
T C
T T
A T
G G
G A

So let's see. . . you say we should be looking for "a phased pattern
that is different from HUGO". . . In a man, there will be only one X,
so it doesn't need to be phased, that's why this statement confused me.
The original person in whom this X-Y recombination event occurred, a
man, had a particular X-SNP copied over from his one X chromosome onto
his Y.

> The X chromosome has very ancient blocks that seem to hold together
> over thousands of years and there are only rare crossovers in some of
> the regions. So, it is easy to phase these.

OK, I see, you must mean, when you have data from two X's, then you need
to phase them. But in a man's data, they would be much easier to pick
out without having to phase them.

> Neither 23andMe nor FTDNA shows the results at rs12859783. The
> question I was asking way back then was, can we predict our own SNPs
> at rs12859783 (position 7133305 at HapMap) based on our own phased
> pattern?
Aha, you're talking about finding one's rs12859783 result on the *X
chromosome*. That's another kettle of fish -- at first I thought you
were talking about rs12859783 on the Y.

> Yes I think we can. If so, then we know which "haplogroup" on the X
> chromosome carried the SNP that got passed from the X to the Y.

> We can collect contributions from FTDNA and 23andMe data then get
> redacted SNPs from HapMap to add to the predicted sequences.

So you're saying the HapMap data allow us to find actual rs12859783
results from the X? What do you mean by "redacted" HapMap data? Are
parts of it removed?

> When Thomas found the X to Y SNP in Bonnie Shrack's father, I remember
> I was playing around with HapMap and 23andMe X raw data over a year
> ago, phasing different haplotypes and there were three phased
> haplotypes on the X chromosome that emerged in the mostly European
> populations for this tiny haploblock. There was only one of these
> three haplotypes (which I would call a haplogroup, but I have no
> academic authority to do so) that could have made that crossover or
> maybe I should call it a contribution.

So you're saying that just one of the three haploblocks had the C allele
at rs12859783. . .

> I am learning that not all exchanges are crossovers.

Or, not all "recombination" is an exchange where two chromosomes swap
segments with one another. Whit rejected the idea of X-Y recombination
when I first presented it to him, because he understood me to be talking
about a crossover exchange where a block from the Y went onto the X, and
vice versa. But this kind of "recombination" seems like it may be
one-directional, like a RecLOH on the Y, where one side of a palindrome
is copied over onto the other.Don't ask me about the mechanism between X
and Y! ;-)

> ... I assume Bonnie's father would have been heterozygous in the L88
> Region because one SNP was on the Y and the other was on the X but may
> have looked like it was a Y SNP.

This paragraph threw me. How would he be heterozygous -- in which
chromosome's data would that show up? The very reason we know that
there was an X-Y event in our lineage is that my father's X chromosome
does NOT have the C at rs12859783, but rather, the normal G. This shows
us that the C he has at L198 (rs12859783 on the Y), was inherited, and
not the product of an event in his own generation. If his own X had the
derived C allele, it could indeed have been a mix-up between X and Y in
reading the lab results.

Thomas carefully sequenced the entire region and it's clear that the C
allele at rs12859783 was on his Y chromosome, and the G on his X.

> Anyway, the data that a group of us genealogists collected is shown
> here in this Google Doc sheet if anybody is interested in adding to
> it. Its kind of a free-for-all approach to scientific discovery. I
> hope this web site does not suddenly self destruct since too many
> cooks tend to spoil the broth.
>
> https://spreadsheets.google.com/ccc?key=0Ah3qUyFYAhKudDhWVXJyMkN3eXJoMUlGNi1IS08zZVE&hl=en#gid=0
>
> I have not kept it updated (last update in April) but the darker blue
> is the predicted haplotype that probably transfered the X to the Y
> chromosome based on HapMap combined with 23andMe and FTDNA data. If
> you phase a larger region from HapMap, you will see that the two blue
> groups are more closely related to each other on the (still to be
> constructed) X phylogenetic tree than the red group.

OK -- I see that you've attributed the C allele to "Haplotype C." But
you don't really explain the process by which you came to that
conclusion. Are you saying that you looked at HapMap X chromosomes and
found that all those with the "Haplotype C" pattern also had C at
rs12859783? How many of them were you able to find with that "C"
pattern? What ethnicities were they?

> I called the Y SNP "transgametic" and I hope that is the appropriate
> terminology. I hope a geneticist will critique the methodology and
> tell me if I am on the right track here. Are these conclusions
> reasonable? In addition, why is there a transfer from one end of the X
> to the middle of the Y? Is that to be expected? I hope we got the
> correct SNP ID numbers here.

Actually, if you BLAST it, there are all kinds of places that have the
X-block sequence containing rs12859783 (or its equivalent), with the G
allele. I found that in one version of the human X sequence, it's found
at positions 5005061-5005082. There are various "assemblies" of the
human reference sequence. In other assemblies, the same X-block is
found, this time with the A allele at the location of rs12859783, in
positions 3099237-3099253. Also, it's found on Chr. 13 in both humans
and chimps. There are lots of other interesting things like this that I
found with BLAST.

Thomas had not been sure why the HUGO sequence has the A at rs12859783,
thinking that it was a de novo Y mutation within the "X-degenerate"
sequence, but since it shows up in multiple places on the X and in other
chromosomes, it may not be such a mystery.

We should also be thinking about the fact that there are six SNPs in the
L88 region that seem to have been copied over from the X to Y at least
once, and possibly twice, in other haplogroups. If you'll look at
Adriano Squecco's spreadsheet for L88 (rs3966071), you'll see it with
the C allele in E1b1a, and apparently, A3b2.

An E1b1a person was sequenced in the Houston lab, and found to have the
same six SNPs in a row that my father has, within a space of 13 bases,
starting with L88 and ending with rs4048733 (unnamed SNP). These six
SNPs reflect a transfer of the regular, HUGO X sequence onto the Y. In
my dad's case, he additionally found L198, seven bases away. I haven't
heard that a Haplogroup A person has had the region sequenced, though,
so we don't know whether that 23andMe L88+ result for them goes together
with the whole series of SNPs.

I found with BLAST that there are two X-blocks we're looking at here,
that are found separately in other locations-- the first block
containing the firstsix SNPs, and the second one containing rs12859783.
And now, using the same L88 primers, Thomas has found two additional
SNPs in this second block, which are 13 and 14 bases further on from
L198. He has named them L273 (C to T) and L274 (A to G). They were
found in a member of haplogroup Q-M378.

These new SNPs exactly mirror the HUGO X sequence. The derived T and G
at these positions in the second block are also found in all the other
alternate positions on the human X sequence (mentioned above) that are
shown in BLAST with the A at rs12859783, and on other chromosomes.
Since, as far as I know, the Q person wasn't L198+, we can suppose that
in their cases, a slightly different part was copied. This whole
region's high similarity on X and Y must somehow encourage these events.

I should also mention that I'm aware of at least three other places on
the Y chromosome where similar SNP patterns can be found, which also
look like the product of the same kind of process. These involve SNPs
that are turning up in the tree with .2 or .3 notations.

Sorry I got pretty technical there towards the end, but hopefully there
are a few people who will be interested.

Bonnie




This thread: