Friday, December 30, 2016

Natural Selection Did It!

In my previous post I hypothesized that natural selection changed mHG frequencies in Europe after 3000 BC. Since then I’ve tested that hypothesis by analysing modern and ancient European mtDNA. I’m presenting my findings in this post.

The data I used is presented below in links. Only these handful of ancient European populations have enough mtDNA published to compare to moderns.

Mesolithic Western Europeans. 11000-7000 BC. N=36
Early Neolithic Hungary, Germany. 7500-6500 BC. N=450
Neolithic Iberia, France. 7000-4500 BC. N=171
Middle Neolithic (mostly)Germany. 6000-4000 BC. N=151
Late Neolithic/Chalcolithic Iberia, France. 3500-2800 BC. N=143
Chalcolithic Pontic Caspian Steppe. 4000-3000 BC. N=98

Haplogroup Frequencies: Modern and Ancient.
H Subclade Frequencies: Modern and Ancient
JT, U5, N1, Subclade Frequencies: Modern and Ancient.

The ancient European populations I included in that spreadsheet can be broken up into three groups defined by mHGs mostly specific to them.

Mesolithic Western Europeans: U5b.
Neolithic Europeans: K, H1, H3, N1a1a, T2, J1c, HV0.
Pontic Caspian Steppe Folk: U5a, U4, T1a, H6, I

I easily distinguished European and West Asian mtDNA in this post. I can not however distinguish the mtDNA of different European populations to any significant degree using higher coverage mtDNA data. Uniformity is the adjective which best describes European mtDNA. Why is it is so uniform? Common ancestry is one reason. Natural selection is another and I give the reasons why I think that in this post.

Autosomally modern Europeans can be successfully fitted as a mixture of the three ancient populations listed above, with some extra stuff added for some. If mtDNA from these ancestors was passed down to modern Europeans with no natural selection affecting mHG frequencies then 1; modern European mtDNA diversity would follow the same trends as autosomal diversity, 2; modern European mtDNA could fit as a mixture of the mtDNA of those ancient populations.

So is that the case? Mostly No and a little a yes.

1. Do Autosomal and mtDNA Correlate?

Correlation mtDNA/Autosomal

The first method I used to learn if natural selection has affected European mtDNA or not is see how well autosomal DNA and mtDNA correlate. I did this by comparing the frequencies of typical Neolithic, Steppe, and Mesolithic mHGs with Neolithic, Steppe, and Mesolithic ancestry according to autosomal DNA.

U5b shows little correlation with Mesolithic ancestry.

The most typical Neolithic mHGs have terrible correlation with Neolithic ancestry. T2b and J1c might peak in Northern Europe, including Lithuania for T2b, where Neolithic ancestry is lowest. I didn’t include H1 in the above spreadsheet but it to has no correlation with Neolithic ancestry. There is some correlation in mHGs HV0 and K. K is lowest in NorthEast Europe where Neolithic ancestry is lowest(but is high in Ireland, Scandinavia). Both HV0 and K peak in Iberia and Southwest France where Neolithic ancestry peaks.

Typical Steppe mHGs U4 and U5a do correlate well with Steppe ancestry. Both peak in the NorthEast where Steppe ancestry peak. The next strongest presence for both might be in YugoSlavia and Scandinavia. T1a and I(N1a1b1) don’t correlate well with Steppe ancestry. H6 however might.

So there’s some correlation between mHG frequency and ancestry but not a lot. The frequencies of U5b, K, T2, J1c, HV0, H1, H3, U5a, U4, T1a in Europe overall aren’t consistent with being passed down from ancient populations with no natural selection affecting frequencies.

2. Steppe+Neolithic+Mesolithic mtDNA=Modern European mtDNA?

The second method I used to learn if the hypothesis that natural selection has affected European mHG frequencies is test if European mtDNA can fit as mixture of Neolithic, Steppe, and Mesolithic.

Here’s an explanation of how I did this. Let's say Polish have 40% Steppe mtDNA. Using simple math I can calculate what effect a 40% contribution of mtDNA from published ancient Steppe people would have on Polish mtDNA. Then I can create a zombie of the other 60% of the Polish mtDNA gene pool’s mHG frequencies. If the zombie’s mHG frequencies are similar to Neolithic Europeans natural selection hasn’t effected Polish mtDNA. If the zombie’s other 60% are radically different than natural selection has probably affected Polish mtDNA.

I did similar calculations for four European populations; Sweden, South Poland, SouthWest France, and North Spain. For each population their non-Steppe and non-Neolithic zombies’ mtDNA were pretty different from actual Steppe and Neolithic peoples’ mtDNA. Check out the Results in this spreadsheet.


If Swedes and Poles have 60% Middle Neolithic mtDNA than their non-Neolithic mtDNA would have to be 63-73% H and 0-(-)2% J, and (-)4-(-)10% K. That’s literally impossible, let alone very different from Steppe mHG frequencies. When they are given 70% or 80% or more Middle Neolithic mtDNA the results of their non-Neolithic zombie’s mtDNA gets crazier and crazier(80% H, -20% K, etc).

If SouthWest French and North Spanish have 65% Middle Neolithic mtDNA than their non-Neolithic mtDNA would have to be 80% H, -6% U5b, -5% J. If their Middle Neolithic mtDNA is higher than their non-Neolithic side’s mtDNA gets crazier and crazier.

What other explanation is there for this but natural selection?

One explanation is that there was a population in Europe with high frequencies of H, low frequencies of K, etc. who swept across the continent. I tested this hypothesis. It doesn’t work. As far as I can see it’s impossible.

Refer back to the results from the spreadsheet; mtDNA=/=Steppe+MN, I just discussed. If the mtDNA of the Europeans I tested in that spreadsheet are modelled as anything over 50% Middle Neolithic or Steppe the other 50%(“mtDNA zombie”) comes out with mHG frequencies in the negatives.

I know from results I didn’t present in the spreadsheet that negative frequency results only disappear when Steppe or Middle Neolithic make small contributions. The smaller their contribution the more reasonable the zombie results become because the zombie becomes more and more like the modern European population. For example, if I modelled Polish as having 1% Steppe mtDNA their 99% other mtDNA zombie would come out looking like Polish mtDNA. The only way to make the non-Neo or non-Steppe zombie of European mtDNA have reasonable results is if Steppe and Middle Neolithi make small contributions to European mtDNA while an unknown zombie population with mtDNA like modern Europeans makes a huge contribution.

In other words the only possible scenario in which natural selection didn’t affect European mtDNA is if there was an ancient population with mHG frequencies like modern Europeans that swept across Europe and replaced 60%+ of the preexisting mtDNA making everyone in Europe have similar mHG frequencies. Sounds crazy right?

Friday, December 23, 2016

What the heck happened to European mtDNA?

The number of mtDNA samples from Neolithic/Chalcolithic Europe(roughly 5500-3000 BC) has grown to over 500. The principal locations the samples come from are Germany, Spain, and Hungary.

I've been studying European mtDNA from this era for two years. What has been blatantly obvious to me is that since the Neolithic/Chalcolithic the frequencies of mHGs in Europe have changed pretty dramatically. mHG K and T2 are about half as popular as they once were. In contrast mHG H is about twice as popular as it once was. 

Data from Germany, Spain, and Hungary all tell the same story. Here are a few states to demonstrate this change....

Frequency of Haplogroup H, K, T2, N1a1a
Early Neo Germany/Hungary: 20%, 18.7%, 25.4%, 9.4%
Early Neo Spain: 17%, 30%, 12.7%, 0.8%
Middle Neo/Chal Germany: 26.5%, 16.5%, 14%, 4.6%
Chalcolithic Spain: 23%, 22%, 4.3%, 0%
Modern Spain: 35%, 6.5%, 4%, 0%
Modern Poland: 45.2%, 3.4%, 9.4%, 0%

mHG frequencies in Spain and Germany/Hungary weren't identical back then and mHG frequencies among Europeans aren't identical today, but consistent mHG frequency trends exist in each era no matter the location in Europe. 

What caused mHG frequencies in Europe to change?

If my life depended on it I would guess natural selection is the answer. I would guess that certain mHGs affected post-Neolithic European women's' mitochrondrial DNA in ways which helped them have more daughters than women of other mHGs. I don't believe migration from a population with scary high frequencies of H and scary low frequencies of K and T2 is the answer. Genome-wide DNA from Pre-Historic Europe tell us migration into Europe after the Neolithic came primary from the Pontic Caspien Steppe which had pretty low frequencies of H. I discussed how migration from the Pontic Caspien Steppe affected European mtDNA here

Neolithic and Chalcolithic European mtDNA belongs (95%+++!)exclusively, exclusively to what today are European-specific haplogroups. So they are important ancestors of modern European mtDNA without a doubt. Genome-wide DNA is consistent with the idea they're an ancestor of modern Europeans, actually it suggests they're the most important ancestor. Haplogroup H was unfrequent in  Neolithic/Chalcolithic Iberia but about 90% of it was European-specific H1 and H3. Any theories that European specific mHGs, like H1, arrived from an unknown source are invalidated by ancient mtDNA. Ancient European belonged to the same mHGs bug at different frequencies.

Thursday, November 24, 2016

Insights from 600 Southern African Mito-Genomes

The definitions of new terms I used in this post
mHG=mtDNA haplogroup
Macro-mHG=mtDNA haplogroup which is popular.

I just analysed over 600 mtDNA genomes of four populations from Southern Africa; Khoisans, Angolans, Zambians, and Pygmies. I have studied mtDNA from most parts of Eurasia but had never studied African mtDNA before.

In this post I’ll….

1. Explain how Southern African mtDNA is related to Eurasian mtDNA and how its structure is similar to the structure of Eurasian mtDNA.
2. List the Macro-mHGs in each Southern African population.
3. Discuss how related and unrelated the mtDNA of these Southern African populations are to each other.


The image below explains well how African(inclu. Southern African) mtDNA is related to Eurasian mtDNA.

Eurasian mtDNA descends from two subclades of L3. There are many L3 subclades in Africa in addition to countless mHGs which are 1st, 2nd, 3d, etc. cousins of L3. Africa has more variety in mtDNA because humanity probably originated somewhere in Africa. Humans had probably been living in Africa for 10,000s of years before closely related groups of humans, whose mtDNA descended from two subclades of mHG L3, gradually inhabited Eurasia.

The mtDNA of Southern African populations is structured similarly to the mtDNA in Eurasian populations. What I mean by this is; both mostly belong to a small number of Macro-mHGs not a large number of Micro-mHGs.


There’s several layers of Macro-mHGs in Southern Africa. This is because a first layer formed when Macro-HGs were born in early human history, then those Macro-mHGs gave birth to a new layer Macro-mHGs, then those Macro-mHGs gave birth to a new layer of Macro-mHGs, and so on. Below are a few of the layers of Macro-mHGs of Southern Africa. Note not all of the Macro-mHGs are listed below. Only the most popular/Macro are listed.

1st Layer: L0, L1, L2, L3
2nd Layer: L0a, L1c, L2a1, L3e, L3d, L3f
3rd Layer(L0a): L0a1b, L0a2a
3rd Layer(L1c): L1c2b, L1c1.
3rd Layer(L2a1): L2a1f, L2a1d2(a), L2a5(a)
3rd Layer(L3): L3f1b4a, L3d3a1, L3e1a3a, L3e2b

Below is a link with mtDNA trees displaying the Macro-mHGs. All haplogroups I labelled are Macro-mHGs. These trees visually present the age-dictated layering of Macro-mHGs and the relationship between different Macro-mHGs.

Trees of Southern African Macro-mHGs. Macro-mHGs are colored according to the clusters they belong to which I describe in the next section.


Despite the fact all four of the Southern African populations I analysed live relatively near each other there is considerable mtDNA diversity among them. Don’t get me wrong, there’s also lots of uniformity. I classified four Macro-mHG clusters and discovered almost 200 new subclades.

I classified four Macro-mHG clusters. Here they are…..
Angola, Zambia: L0a2a1, L0a2a2a, L0a1b2, L1c2b, L2a5(a), L3e2b
Pygmy: L1c1a1a, L1c1a2, L2a2b
Khosian, Zambia: L2a1b1a, L0a1b1, L2a1f, L3f1b4a, L3d3a1
Zambia: L1c3a, L2a1d2(a), L3e1a3a

The frequencies of each cluster is presented in this spreadsheet: mHG Clusters of Southern Africa. The four clusters reveal lots of sharing amongst Angola-Zambia. Angola’s closest mtDNA relative is Zambia and Zambia’s closest mtDNA relative is Angola. Zambia also shares a significant amount of mtDNA with Khoisan which no other population does.

The Pygmy are the only group who makes up a cluster all on their own. Three Macro-mHGs take up about 50% of their mtDNA. I’ve seen a few Macro-mHGs dominate small closed-off populations in Eurasia(eg, Saami, Kalash, Basque), so my guess is that Pygmy are a similarly small closed-off population. As I’ve learned from Eurasia, small isolated population’s share little mtDNA and Y DNA with their neighbors but are still closely related to their neighbors. So Pygmy are probably very related to other Southern Africans.

As I said above I discovered almost 200 new subclades. This spreadsheet posses an analysis I did on those about 200 subclades: New Subclades of Southern Africa.

As you can see from the spreadsheet shockingly 20%+ of the mHGs in 4 of 5 of the population belong to New Subclades which are exclusive to their population. Also 30%+ of the mHGs in 5 of 5 of the population belong to New Subclades in which their population makes up at least half of the members of the New Subclade. This means there’s a lot branches and regional variation in the African L-family that is unknown by geneticists.

Saturday, June 18, 2016

Insights from New Pre-Historic Middle Eastern mtDNA

Lazardidis et al. 2016 collected genome-wide data, along with mtDNA, from 44 ancient Middle Easterners. 27 of 44 had enough mtDNA coverage to get haplogroup results. I added the new ancient mtDNA results to this spreadsheet Ancient Middle East which includes all of the ancient Middle Eastern mtDNA results I've collected that are at least 3,000 years old.

By the way I have upcoming posts on analysis of mito-genomes from specific haplogroups and regions, just like the one I did of JT recently. Analysis of mitogenomes of Southern Africa and haplogroups K1 and H1 are coming up next.

Preview: The new mtDNA results show that modern regional mtDNA diversity in West Eurasia had already began to form by the Neolithic and support the conclusions of Lazardidis et al. 2016 that Levant Neolithic and Anatolian Neolithic did not have a lot of recent common ancestry.

I broke up my post into two sections.

1. What "Middle Eastern mtDNA" is. 
2. The mtDNA relationship between Paleo/Neo/Chalcolithic Middle Easterners and modern West Eurasians.

Lazardidis et al. 2016 was an absolute masterpiece like all of the other ancient DNA papers created by the same team of researchers who appear to have dedicated their careers to unlocking the origins of humanity as a whole and the diversity within humanity by taking DNA from old bones.

Below listed regions and time periods  the new mtDNA results come from. .

Levant(Israel and Jordan).
Paleolithic 12,000-14,000 years old. N1b
Neolithic 9,000-10,000 years old. K1a4b, T1a, T1a2, R0a, R0a2
Bronze age 4,000-4,5000 years old H14a, X2m

SouthWestern Iran
Mesolithic 11,000 years old
Neolithic 10,000 years old X2, J1c10
Chalcolithic 6,000-8,000 years old K1a12a, K1a12a, U7a, U3a'c, I1c, H29, X2
Bronze age 3,500 years old U1a1

Chalcolithic 5,500-6,500 years old K1a8, K1a8, H, H2a1, U4a
Early Bronze age 4,500 years old H1u, X2f
Late Bronze age 3,500 years old T1a1'3

Western Turkey
Chalcolithic 6,000 years old K1a17

1. "Middle Eastern mtDNA"

I consider the below mtDNA hapologroups "Middle Eastern mtDNA".
R0(inlu. HV, H, V), U1, U3, U7, K(aka U8b2), U9, JT, N1, N2, X.

This is why I consider them Middle Eastern: We have over 100 mtDNA samples from European hunter gatherers ranging 38,000 to 8,000 years old. All but two belonged to haplogroup U(xK, U1, U3, U8, U9). Just about all of them belonged to European-specific forms of U. They contributed 10-25%(ranges by region) mtDNA to modern Europeans. The rest of European mtDNA falls under haplogroups that first appear in Europe when people from the Middle East migrated there starting 9,000 years ago. These newcomers to Europe also carried with them a new type of ancestry that didin't exist in Europe prior. It;s a Middle Eastern-specfic type of ancestry that all our new ancient Middle Eastern genomes share. Everywhere this Middle Eastern-specific ancestry is found today so are the above mtDNA haplogroups.

All of these Middle Eastern haplogroups are definitely over 30,000 years old. By 10,000-12,000 years ago they had gained many subclades and regional diversity. There was already a high amount of sharing between distant regions in the Middle East among distantly related populations and many "Expansion Point Lineages"(Definition of term is here). Most "Expansion Point Lineages" that exist today certainly already existed 10,000 years ago. This prediction of mine is supported by the new ancient Middle Eastern mtDNA results. By the time Middle Eastern mtDNA migrated into Europe, South Asia, and Africa there was a lot of mtDNA sharing within the Middle East and so today you'll find subclades of haplogroups(like T2a1a) that show no differentiation in Ireland and Yeman and Ethopia and India. You can't tell apart Ethopian T2a1a from Irish T2a1a.

Today a great majority of European mtDNA is "Middle Eastern". Don't let the term "Middle Eastern" confuse you. They were only in the Middle East 10,000 years ago but since then they have spread. Europeans, East Africans, and South Asians are to a large extent of Middle Eastern decent. Especially Europeans, most are over 50% and the impact is much greater on mtDNA than Y DNA.

2. The mtDNA relationship between Paleo/Neo/Chalcolithic Middle Easterners and modern West Eurasians.

Below is a comparison of the new ancient Middle Eastern mtDNA and already published mtDNA results from Neolithic Anatolia and Europe to modern mtDNA. In the end I give a concluding remark as to what this tells us about the origins of regional mtDNA diversity in West Eurasia today.

Natufian and Neolithic Levant mtDNA: N1b, K1a4b, T1a, T1a2, R0a, R0a2.

N1b  and R0a peaks in SouthWest Asia today are rarely found outside of the Middle East. T1a2 and K1a4b are Middle Eastern-specific haplogroups today(however T1a2 has a strong presence in Italy) and peaks in frequency in the Levant. Most of my SouthWest Asian K1a mito-genomes so far are K1a4b.

So in conclusion Natufian and Neolithic Levant mtDNA so far looks similar to mtDNA in the Levant today.

Neolithic Anatolia/Europe:

We have over 300 samples from them so there's no point in listing all those results. I will explain though how Neolithic Anatolian/European mtDNA was similar to modern European mtDNA.

>J1c, T2b, J2b1a: Today these are all European-specific clades of JT(see here). A similar ratio of Neolithic Anatolia/Europe's JT belonged to these subclades as modern Europeans, and in fact they had a higher frequency of them than any modern Europeans. Of the few high coverage J1c, T2b, and J2b1a genomes from Neolithic Anatolia/Europe we can see they had the same subclade combustion as modern Europeans. This includes popular subclades that are at >1% outside of Europe.

>K1a4a, K1a1b1, K1a2a, K1a3a: These are all K1a subclades that are rare or non-existent in my Middle Eastern mito-genomes and very popular in my European mito-genomes. They've all been found in the few K1a mito-genomes we have from Neolithic Europe. K1a4a is the most important one. It's the main K1a subclade in at least Denmark and White Americans, and was found in a 7,000 year old woman from Neolithic Spain. Her ancestors arrived in Spain only a few hundred years earlier from Anatolia.

>HV6-17, HV0(mostly V): These are the only popular forms of HV(xH) in Europe. They both are very rare outside of Europe. West Asia, especially SouthWest Asia, has a more diverse array of popular HV(xH). No HV(xH) sample has been found in Neolithic Anatolia/Europe except European-specific HV6-17 and HV0(mostly V) and both were fairly popular.

>H1, H3: H1 is very popular in all of Europe reaching 15% or higher outside of Italy and the Balkans. It is under 5% in all of the Middle East. H3 is at just over 5% in the British Isles, Scandinavia, France, and Iberia but is only at a few percent in the rest of Europe. H3 is at a >1% frequency in the Middle East.

Neolithic Central Europeans had a low frequency of H1(~5%) and even lower for H3(less than 1%). However Neolithic French and Spanish had high frequencies of both and most importantly H3. Most of their H was either H1 or H3. In some Neolithic grave sites in Spain H3 was more popular than H1. Notably typical Basque H3c was found in Neolithic Spain and typical Danish H1c was found in Neolithic Sweden.

Neolithic and Chalcolithic Iranian mtDNA: X2, J1c10, K1a12a, K1a12a, U7a, U3a'c, I1c, H29, X2

J1c10 is a very rare subclade of J1c. My J1c10s are from Bedouin, Italy, Sardinia, and Morocco. Nothing can be concluded about J1c10 from that. However I can say that J1c is the primary form of J in Europe and rare outside of Europe. It arrived in Europe 9,000 years ago from Neolithic Anatolia. It's very surprising to see J1c10, instead of J1b1b, in Iran 10,000 years ago. Very surprising.

K1a12 is almost completely absent in close to 1,000 K1 mitogenomes from Denmark, Finland, and White Americans. That covers small parts of Europe. My only K1a12(all but two are K1a12a) samples are from Armenia, Kuwait, Iran, Turkey, Druze(Levant), and Italy. It isn't particularly popular but my data so far supports the idea it's mostly West Asian and even Northern West Asia(where Iran is).

U7 doesn't reach 1% in frequency in any region of Europe. It's nonexistent in close to 500 mtDNA samples from pre-historic Europe. U7 is at 1.3% in the Levant, 3-5% in Turkey and Iraq, and a whopping 10% in Iran. U7 is also one of the most popular West Eurasian haplogroups in India. The U7a result from Chalcolithic Iran directly connects them maternally to modern Iranians and South Asians.

U3 varies from 1-3% in Europe, is at 3% in Iran, and 5% in the Levant. It peaks in SouthWest Asia but has a decent presence in all of West Eurasia. I do have a large collection of U3 mito-genomes but haven't looked at them in detail yet. Most are U3b not U3a'c in every region, including Iran.

Chalcolithic, Bronze age Armenia and Anatolia mtDNA: K1a8, K1a8, K1a17, H, H2a1, U4a, H1u, X2f, T1a1'3

K1a8 and K1a17: Both are rare. My only examples of K1a8 are from the Levant(several countries. My only examples of K1a17 are from Levant, Arabia, Italy, Egypt, and Kuwait.

U4a: U4a is of "EHG"(Mesolithic East Europe) origin no doubt about it. It's consistent with the EHG ancestry that Lazardidis et al. 2016 modeled Chalcolithic Armenia as having. Looking at the evidence presented by  Lazardidis et al. 2016 I'm very confident Chalcolithic Armenians had EHG ancestry.

H2a1: It found at least 1% in all of my West Eurasian regional sample sets except Iran, Spain, Balkans. It peaks in my samples from the NorthEastern corner of Europe(Baltic, BeloRussia, Karelia and Russia) stats at about 3%. Frequencies of H2a1 can't tell you anything about its history. Only a large amount of H2a1 mito genomes can.

Something significant I can say though looking at ancient mtDNA that H2a1 is absent in over 300 samples from Neolithic Europe but existed in Eneolithic and Bronze age Russia and then later in Bronze age Central Europe after people from Russia migrated there. It's existence in Chalcolithic Armenia, suggests it originated in the Caucasus region and then migrated to Russia and from there Central Europe.

H1u: Very rare subclade of H1. I have over 1,000 H1 mito genomes from Basque, Danish, Finnish, Italy, and White Americans. Two are H1u, one from Denmark and one from a White American. I have a much smaller amount of H1s(maybe 20) from the Middle East and have found two examples of H1u and both are from Druze.

X2f: It is absent in my European samples. It exists in just about every Northern West Asian population I have examples from; Georgia, Armenia, Iran. I also have an example from Druze.

T1a1'3: T1a1'3 is the common ancestor of T1a1 and T1a3. Chances are this sample had T1a1 or T1a3, but because of low coverage it couldn't be discerned which one he had. T1a3 is unheard of today, my only examples are from three White Americans and one Isreali. T1a1 on the other hand is the most popular form of T1a in Europe and Iran. In Northern Europe <90% of T1a is T1a1, while in Iran and Italy T1a1 is most popular but you'll commonly find other forms of T1a. Many examples of T1a1 have been found in Bronze age Europe and Central Asia(immigrants from Europe). It was popular in Bronze age Steppe populations and likely brought to Europe from the Steppe, and ultimately from ancient Northern West Asia.

Nothing can be deciphered from a X2 results(instead of X2b, X2f, etc). I don't have any I1c mito-genomes as of far. I1 is the most popular I subclade in most of West Eurasia and first appears in Europe with Steppe admixture(ultimately from populations similar to Chalcolithic Iran?).

Concluding Remark: These new results and already published result suggest that by the Early Neolithic much of the mtDNA diversity in West Eurasia had formed. We see Levant-specific mtDNA in Neolithic Levant, Northern West Asian-specific mtDNA in Neolithic/Chalcolithic Iran, and European-specific mtDNA in Neolithic Anatolia.

The mtDNA differentiation between Anatolia Neolithic and Levant Neolithic are consistent with conclusions by Lazardidis et al. 2016, that farming spread to Anatolia without much gene flow from the Levant.

Friday, June 10, 2016

Indepth Look at Haplogroup JT

mtDNA haplogroup JT is the daughter of Pan-Eurasian haplogroup R and one of the primary haplogroups in West Eurasia but can also found in many parts of Africa and Asia. It's a very old lineage and covers a lot of land, and therefore has a lot of phylogenetic diversity and regional diversity. In a previous post(LINK) I used HVR1 data from JT samples to display JT regional diversity. To learn more about JT diversity I collected and analysed over 1,000 JT mito-genomes from Ian Logan's site.

I broke this post into four sections to give a good understanding of what I've learned about JT.
1: My Strategy for Analyzing mito-genomes
2: Description of Samples and Spreadsheets used for Analyse
3: Results
4: Comparison to JT mito-genomes from Ancient DNA

My Strategy for Analyzing mito-genomes: My strategy when analyzing mito-genomes is to look for what I call "Expansion Point" haplogroups. An "Expansion Point" haplogroup is a haplogroup which is popular but doesn't have any popular subclades. The reason I this is my strategy is"Expansion Points" are the youngest haplogroups and can tell us the most about the origins of mtDNA in modern populations. They tell us the most about the origins of human mtDNA because most humans belong to "Expansion Point" haplogroups and very rarely do two humans from the same population have maternal lineages that are related beyond being apart of a popular "Expansion Point" Lineage. There are dozens or hundreds of "Expansion Point" haplogroups which take up some 90% of the mtDNA in human populations. Most are fairly young, less than 20,000 years old. For those of you who are familiar with European Y DNA an example of a Y DNA "Expansion Point" haplogroup is R1b-L151.

Description of Samples and Spreadsheets used for Analyse: Most of the samples were used by Maria Pala et al. 2012. They came from various locations in West Eurasia, Siberia, and South Asia. I also added about 500 JT mito-genomes provided by Ian Logan from Denmark, Italy, Arabia, and Iran.

I fully analysed the relationship between all about 1,000 mito-genomes used by Maria Pala et al. 2012. I found over 60 new haplogroups but none of them are popular(aka "Expansion Point" haplogroups) outside of single populations and so not very helpful to know about.

Here's a spreadsheet of that Analysis: Analysis of 1000 JTs.

With the about 500 JT mito-genomes I added to the ones used by Maria Pala et al. 2012, I was able to gather enough samples to create five regional populations to compare their frequencies in JT subclades. The five populations are: Denmark, Italy, USA, Near East, and Northern West Asia. The USA population consists of Americans who's maternal line is from Europe and probably specifically from Britain or Germany.

Here's the spreadsheet of a comparison of those five regions: Regional Frequencies of JT

Results: As I said before my strategy when analyzing mito-genomes is to find "Expansion Point" haplogroups, so here are the "Expansion Point" JT haplogroups I found. I color them according to which of these regions they are most popular in: Generic West EurasianGeneric Europe, NorthWest Europe, ItalyGeneric West Asia, Iran, Near East.

T2a1a, T2a1b1a(T2a1b1a1b), T2b(T2b4, T2b23, T2b31), T2c1(T2c1c, T2c1d1, T2c1d1a, T2c1e, T2c1a), T2e, T2f1a1T2g1aT2iT1a1, T1a2T1a7T1a11, T1b(T1b2, T1b3).

J1c1(J1c1b, J1c1b1a, J1c1d), J1c2(J1c2b, J1c2o), J1c3, J1c15, J1b1a1, J1b1b(J1b1b1), J1b2, J1b3, J1d(J1d1a)J2a1a1a(J2a1a1a2), J2a2a, J2b1a, J2b1(xJ2b1a)

And that's it. Those are all the "Expansion Point" haplogroups of JT. Most JT falls under these clades. There isn't anymore I can do. All subclades of these haplogroups are so rare and exist in most of West Eurasia that they tell nothing about regional diversity. It's amazing there are still pretty basal JT subclades which show no regional variation. T2c1 and T2a1a in Italy and Yeman aren't distinguishable from each other.

Here are a two very important lessons I learned about West Eurasian JT.
Recent contact across all of West Eurasia
>Rare and young non-"Expansion Point" subclades are found in every part of West Eurasia.
>Regional specific and young "Expansion Point" subclades are found in every part of West Eurasia, it's just they're more popular in some parts.
West Asia vs Europe, with Italy and Turkey/Cyprus as intermediates
>NorthWest Asia/Near East and Italy/NorthWest Europe aren't perfect subpopulations, but each member in these subgroups are by far most similar JT to each other. Italy however has a significant amount of typical West Asian JT haplorgoups; e.g. T1a2, T1b, J1b2, J1d1a and Turkey/Cyprus have a significant amount of typical European JT haplogroups; e.g. J1c, J1b1a1, T1a1.
>The majority of West Asian and European JT split well over 8,000 years ago.

Comparison to JT mito-genomes from Ancient DNA

Below is a list of all of the ancient JT mitogenomes I know of. All of the results are from Europe or European ancestors/immigrants who lived in Asia, except for the single Armenian sample. All of them have typical JT clades for modern Europeans, except the Armenian who had T1a2 which is more typical of Western Asia included Northern West Asia(Where Armenia is) today.

6500-6200 BC, Barcın Turkey J1c11
6500-6200 BC, Barcın Turkey J1c
5311-5218 BC Spain J1c3
5000 BC, LBK culture Germany J1c17
2500-2050 BC Corded Ware Germany J1c2e
2880-2630 BC Spain J1c1
3000 BC Spain J1c1b1
2625-2291 BC Corded Ware Germany J1c1b1a
2298-2045 BC Sintashta Russia J1c1b1a
2000 BC? Germany J1c1b
2500-2050 BC Corded Ware Germany J1c5
2290-2130 BC Bell Beaker Germany J1c5
2851-2492 BC Denmark J1c4
2128-1909 BC Hungary J1c9
2863-2498 BC Corded Ware Germany J1b1a1

2880-2630 BC Spain J2b1a3
2126-1896 BC Sintashta Russia J2b1a2a
1850-1200 BC Timber Grave Russia J2b1a2a
1000 BC Siberia J2b1a
3900-3600 BC Spain J2a1a1
2880-2630 BC Spain J2a1a1
2500-2050 BC Germany J2a2a

6500-6200 BC, Barcın Turkey T2b
6500-6200 BC, Barcın Turkey T2b
5000 BC? LBK culture Germany T2b
5000 BC? LBK culture Germany T2b
5000 BC? LBK culture Germany T2b
5000 BC? LBK culture Germany T2b
5000 BC? LBK culture Germany T2b
3360-3086 BC Germany T2b
2500 BC? Czech Republic T2b
2034-1784 BC Hungary T2b3
2000 BC? Hungary T2b
1866-1619 BC Hungary T2b
1800-1600 BC Timber Grave Russia T2b4
794-547 BC Denmark T2b
5178-5066 BC Spain pre-T2c1d2
5000 BC? LBK culture Germany T2c1d'e'f
5100-4800 BC LBK culture Germany T2c1d1
4000 BC? Germany T2c1d1
3000 BC? Yamnaya T2c1a2
5500-4800 BC LBK culture Germany T2e
5000 BC? LBK culture Germany T2e
3640-3510 BC Germany T2e1
2887-2634 BC Yamnaya T2a1a
1432-1292 BC Sweden T2a1a
2454-2291 BC Corded Ware Germany T2a1b1
2464-2210 BC Germany T2a1b1
1800-1400 BC Andronovo Siberia T2a1b

2570-2471 BC Germany T1a1
1395-1132 BC Sweden T1a1
2000 BC? Hungary T1a1
1850-1200 BC Timber Grave Russia T1a1
1048-855 BC Armenia T1a2

Tuesday, February 16, 2016

Asia Has Five mtDNA Gene Pools

I have added over 2,000 mtDNA samples from Asia and 1,000 from Europe to my mtDNA database in the last few days. Thanks to HaploGrep I've been able to analysis the data more than twice as fast.

With the new data from Asia, I've learned that there are at least Five mtDNA gene pools in Asia. I've also found geographic diversity in East Asian mtDNA and origins of Western mtDNA in Asia.

In the next few weeks I will 1: Research the mtDNA and genome-wide relationship between ancient and modern Siberians, 2: Do more thorough work on European mtDNA with 1,000s of new samples, 3: Collect 1,000s of mtDNA samples from Africa, and 4: Collect many 1,000s of Mito-genomes from Ian Logan which I can now do very quickly thanks to HaploGrep. So, there's a lot to look forward to on this blog if you're interested in mtDNA.

This spreadsheet shows the five mtDNA gene Pools of Asia and regional haplogroups of East Asian mtDNA.

Regional Asian mtDNA

In total there are at least six mtDNA gene pools in Eurasia. Below is a link to a map of the six mtDNA gene pools of Eurasia along with the list of haplogroups in each gene pool.

Eurasian mtDNA Gene Pools

Here are mtDNA Haplogroup frequencies in Asia: Asia mtDNA Frequencies. I included frequencies of South Asian-specfic haplogroups in South Asians and the frequencies of West Eurasian haplogroups in Asia.

Geographic Diversity in East Asian mtDNA

The East Asian mtDNA gene pool is the geographically the largest in the world. Obviously they can't all have the exact same mtDNA. So, I gathered the frequencies of R9 and C subclades in this spreadsheet to find differences: C, R9. East Asia.

There are noticeable differences in subclade frequencies. The biggest is between Siberia and other East Asians. Siberians have a very high frequency of C4a1, C4a2'3'4, and C5 and all three are near non-existent in other East Asians. Tibet/Nepal also have a decent amount of those C subclades, probably because they live near Siberia.

There's also regional-trends in the frequencies of R9 subclades. The most popular R9 subclade in NorthEast and SouthEast Asia is F1a, in Siberia F1b, and in Tibet/Nepal F1c1a. As with C, Tibet/Nepal have a connection with Siberia because of a high frequency of F1b. There's also a high frequency of F4b, R9c1, and F3b in Taiwan and R9b1a in Burma, which are near non existent in other East Asians.

All other East Asian haplogroups can't easily find haplogroups with low-coverage testing, like H in West Eurasia. So, it was hard to find differences in subclade frequencies in other haplogroups, but I did find some. I wrote those differences down here: Regional Asian mtDNA

M7 has consistent strong presence in NorthEast Asia, a weaker presence in SouthEast Asia, and is pretty much unheard of in all other East Asians. M8a consistently pops up in Siberia and NorthEast Asia, and rarely anywhere else. M9a is absent in SouthEast Asia and consistently pops up in other East Asians. E is non-existent in all East Asians except Tawian where it is pretty popular. I also found many D-subclades that are exclusive to certain countries or regions.

NorthEast Asians(Japan, Korea, China) are fairly similar. Tawian and Siberia are differnt from NorthEast Asia in many ways. There are Palaeilthic-splits in mtDNA and more recent links that they have with NorthEast Asia. Nepal and Tibet have a significant amount of South Asian mtDNA, connections with Siberia, and their own unique bottle-neck lineages. Despite Tibet being in the country of China, it's important to remember they aren't Chinese at all, they were just conquered by Chinese.

Origins of Western mtDNA in Siberia and South Asia

West Eurasian mtDNA in Asia peaks in West Siberia and South Asia at about 30%. Every where else in Asia West Eurasian mtDNA is pretty much non-existent. South Asia and Siberia received their Western mtDNA from very differnt sources. Siberian's Western mtDNA is almost entirely from Eastern Europe and South Asian Western mtDNA has unknown sources.

Siberian's Western mtDNA specifically looks like it comes from Pre-Historic Russia(U4, U5a, U2e, T1a, J1b1a1). They probably have a mixture of Mesolithic Russian and Bronze age Andronovo mtDNA(J1b1a1, T1a, J1c, I, H2a1, H6). The composite of Siberian Western mtDNA, is very similar to Catacomb and Andronovo, especially because of their strangely high frequencies of U4.

Siberians also have a string of typical West Asian subclades of U; U7, U1, and U3. It is strange that they don't have a lot non-U West Asian haplogroups. Maybe there were Ancient West Asians who were mostly U7, U1, and U3 like there were ancient Europeans who were mostly U5, U4, and U2e. I doubt it, but it's possible.

South Asian Western mtDNA is dominated by U2(xU2e) and U7. The U2(xU2e) subclades are rarely found outside of South Asia, so have probably been in South Asia for 10,000s of years and not recent arrivals from West Eurasia. The sources of South Asia's U2 is likely a population closely related to Paleolithic North Eurasians, like Ma'lta boy and Kostinki man. The high amount of U7, like in some Siberians, is very strange. U7 is popular in neighbors of South Asia, like Iranians, but it isn't nearly as popular compared to other Western lineages as U7 is in South Asia. Maybe South Asia's U7 and U2 are from the same source.

Non U7 and U2 South Asian Western mtDNA is a mixture of West Asian-specific and European-specific. West Asian-specific mtDNA in my South Asian data besides U7 is includes HV and R0a. Both are more typical of SouthWest Asia than Iran, but still fairly popular in Iran. It's hard to explain the consistent presence of U5a, U4, and J1b1a1, all typical of Bronze age East Europe, in South Asia if all their Western mtDNA is from neighbors in West Asia.

Thursday, February 4, 2016

Loads of New mtDNA from Paleo-Europe

Face from a Forgotten World
Ivory carving of Human face dating about 30,000 years old from an archaeological site in the Czech Republic. mtDNA samples from this site had U5* and U8c.

Posth 2016 published several dozen new fully-sequenced mtDNA results from Stone age Hunter Gatherers of Europe(mostly Germany, Belgium, France, Italy). I've added all the results to my mtDNA DataBase. There's now well over 100 mtDNA samples from Stone age European Hunter Gatherers.

The results Posth 2016 are near 100% under haplogroup U(xK), like the results we've been getting from European Hunter Gatherers for years. The only surprise is M* dating about 30,000 years old in Belgium and France.


Short summary of the history of European mtDNA, up until 6,000 years ago.

30,000+ years ago Europe was settled by humans carrying mostly mtDNA U(U5, U2, U8, etc), but also M*, N*, and R*. Sometime between 30,000 and 15,000 years ago, U5a'b replaced other forms of U in Europe. Western Europe was dominated by U5b and Eastern Europe by U5a, U4, and U2e. No dramatic changes occurred in European mtDNA, till 8,000-6,000 years ago when humans from Turkey and Caucasus mountains brought JT, R0(inlcu. H), U, N1, X, and W.


The story of European genetics is fascinating and surprising. Modern Europeans are the result of at least two massive migrations from Asia/Far East Europe and a mixture of at least 4 distinct ancestors. They have very close ties to Middle Easterners and to a lesser extent Native Americans and North Asians. In this narrative U-dominated Paleo-Europeans are the aboriginal Europeans. They were some of the first humans to settle Europe and they lived undisturbed for 10,000s of years.