Thursday, November 24, 2016

Insights from 600 Southern African Mito-Genomes

The definitions of new terms I used in this post
mHG=mtDNA haplogroup
Macro-mHG=mtDNA haplogroup which is popular.

I just analysed over 600 mtDNA genomes of four populations from Southern Africa; Khoisans, Angolans, Zambians, and Pygmies. I have studied mtDNA from most parts of Eurasia but had never studied African mtDNA before.

In this post I’ll….

1. Explain how Southern African mtDNA is related to Eurasian mtDNA and how its structure is similar to the structure of Eurasian mtDNA.
2. List the Macro-mHGs in each Southern African population.
3. Discuss how related and unrelated the mtDNA of these Southern African populations are to each other.


The image below explains well how African(inclu. Southern African) mtDNA is related to Eurasian mtDNA.

Eurasian mtDNA descends from two subclades of L3. There are many L3 subclades in Africa in addition to countless mHGs which are 1st, 2nd, 3d, etc. cousins of L3. Africa has more variety in mtDNA because humanity probably originated somewhere in Africa. Humans had probably been living in Africa for 10,000s of years before closely related groups of humans, whose mtDNA descended from two subclades of mHG L3, gradually inhabited Eurasia.

The mtDNA of Southern African populations is structured similarly to the mtDNA in Eurasian populations. What I mean by this is; both mostly belong to a small number of Macro-mHGs not a large number of Micro-mHGs.


There’s several layers of Macro-mHGs in Southern Africa. This is because a first layer formed when Macro-HGs were born in early human history, then those Macro-mHGs gave birth to a new layer Macro-mHGs, then those Macro-mHGs gave birth to a new layer of Macro-mHGs, and so on. Below are a few of the layers of Macro-mHGs of Southern Africa. Note not all of the Macro-mHGs are listed below. Only the most popular/Macro are listed.

1st Layer: L0, L1, L2, L3
2nd Layer: L0a, L1c, L2a1, L3e, L3d, L3f
3rd Layer(L0a): L0a1b, L0a2a
3rd Layer(L1c): L1c2b, L1c1.
3rd Layer(L2a1): L2a1f, L2a1d2(a), L2a5(a)
3rd Layer(L3): L3f1b4a, L3d3a1, L3e1a3a, L3e2b

Below is a link with mtDNA trees displaying the Macro-mHGs. All haplogroups I labelled are Macro-mHGs. These trees visually present the age-dictated layering of Macro-mHGs and the relationship between different Macro-mHGs.

Trees of Southern African Macro-mHGs. Macro-mHGs are colored according to the clusters they belong to which I describe in the next section.


Despite the fact all four of the Southern African populations I analysed live relatively near each other there is considerable mtDNA diversity among them. Don’t get me wrong, there’s also lots of uniformity. I classified four Macro-mHG clusters and discovered almost 200 new subclades.

I classified four Macro-mHG clusters. Here they are…..
Angola, Zambia: L0a2a1, L0a2a2a, L0a1b2, L1c2b, L2a5(a), L3e2b
Pygmy: L1c1a1a, L1c1a2, L2a2b
Khosian, Zambia: L2a1b1a, L0a1b1, L2a1f, L3f1b4a, L3d3a1
Zambia: L1c3a, L2a1d2(a), L3e1a3a

The frequencies of each cluster is presented in this spreadsheet: mHG Clusters of Southern Africa. The four clusters reveal lots of sharing amongst Angola-Zambia. Angola’s closest mtDNA relative is Zambia and Zambia’s closest mtDNA relative is Angola. Zambia also shares a significant amount of mtDNA with Khoisan which no other population does.

The Pygmy are the only group who makes up a cluster all on their own. Three Macro-mHGs take up about 50% of their mtDNA. I’ve seen a few Macro-mHGs dominate small closed-off populations in Eurasia(eg, Saami, Kalash, Basque), so my guess is that Pygmy are a similarly small closed-off population. As I’ve learned from Eurasia, small isolated population’s share little mtDNA and Y DNA with their neighbors but are still closely related to their neighbors. So Pygmy are probably very related to other Southern Africans.

As I said above I discovered almost 200 new subclades. This spreadsheet posses an analysis I did on those about 200 subclades: New Subclades of Southern Africa.

As you can see from the spreadsheet shockingly 20%+ of the mHGs in 4 of 5 of the population belong to New Subclades which are exclusive to their population. Also 30%+ of the mHGs in 5 of 5 of the population belong to New Subclades in which their population makes up at least half of the members of the New Subclade. This means there’s a lot branches and regional variation in the African L-family that is unknown by geneticists.