Notes: Eastern genes that made it to the West – haplogroups mtDNA hgs Z, D and Y-DNA hg N

A major project goal is bringing in members who have maternal line origins representative of the various regions within Asia and Europe where Z originated and/or is currently found. Some common locations include — but are by no means limited to — Scandinavia, Russia, Korea, China and Japan (source: Family Tree DNA – mtDNA Haplogroup Z project)

According to the Genographic Project’s Atlas of the Human Journey,

“Haplogroup Z arose on the high plains of Central Asia between the Caspian Sea and Lake Baikal. It is considered a characteristic Siberian lineage, and today accounts for around three percent of the entire mitochondrial gene pool found there. Because of its old age and frequency throughout northern Eurasia, it is widely accepted that this lineage was carried by the first humans to settle these remote areas. Radiating out from the Siberian homeland, haplogroup Z-bearing individuals began migrating into the surrounding areas and quickly headed south, making their way into northern and Central Asia. A frequency gradient of haplogroup Z is observed the further from Siberia one looks: it now comprises around two percent of the people living in East Asia. Heading west out of Siberia, however, this gradual reduction in frequency comes to an abrupt end around the Ural Mountains and Volga River. This provides a clear example of the impact geographic barriers have on human migration, and thus on gene flow and mixture. To the west of the Urals, this haplogroup is observed at frequencies less than one to two percent, both in northern and northeastern Europe.” In contrast, Ingman and Gyllensten’s 2006 study found Z in percentages up to 7.2 among Finnish Saami, and 4.3 among southern Swedish Saami. Surprisingly, although their total sample size for Norwegian Saami numbered 278, none of those were haplogroup Z.

However, haplogroup Z did not make it far West for the most part, meeting a barrier around the Ural Mountains and Volga River – the reason why it traveled with the Saami people is probably because they were also a maritime whaling culture, and likely circumvented the mountains, perhaps by taking the arctic route going West.

Haplogroup Z in the Saami  (Source: Dienekes Anthropology Blog)
From the article:

The presence of haplogroup Z implies a contribution, albeit limited, to the Sami gene pool from Asia. The close relationship of Z1a lineages from Finns and Sami with those of the Volga-Ural again implicates that region as a probable source for Sami mitochondrial diversity. There is, however, a difference in the apparent ages of the different Sami haplogroups. The nucleotide diversity among Sami sequences for the three haplogroups studied here is very low. The ages of the variation for U5b1b1 and V among Swedish Sami are similar (5500 and 7600 YBP, respectively) but considerably older than for Z (2700 YBP). The surprisingly close link between haplogroup Z1a among Sami and the Volga-Ural sequences suggest that this haplogroup was brought in during the last 2–3000 YBP. Our data supports that a migration from Eastern Europe, in the vicinity of the Volga-Ural region, is the likely source for much of the Sami mtDNA diversity14 but indicates multiple migrations, the first being 6–7000 YBP and at least one additional migration 2–3000 YBP. Considering the similarity observed between Sami and Finnish mitochondrial lineages, this observation of multiple migration events would also support previous population genetic studies that have indicated dual origins of the Finnish people.37

Ingman, Max and Gyllensten, Ulf  “A recent genetic link between Sami and the Volga-Ural region of Russia European Journal of Human Genetics” 20 September 2006; doi: 10.1038/sj.ejhg.5201712


The genetic origin of the Sami is enigmatic and contributions from Continental Europe, Eastern Europe and Asia have been proposed. To address the evolutionary history of northern and southern Swedish Sami, we have studied their mtDNA haplogroup frequencies and complete mtDNA genome sequences. While the majority of mtDNA diversity in the northern Swedish, Norwegian and Finnish Sami is accounted for by haplogroups V and U5b1b1, the southern Swedish Sami have other haplogroups and a frequency distribution similar to that of the Continental European population. Stratification of the southern Sami on the basis of occupation indicates that this is the result of recent admixture with the Swedish population. The divergence time for the Sami haplogroup V sequences is 7600 YBP (years before present), and for U5b1b1, 5500 YBP amongst Sami and 6600 YBP amongst Sami and Finns. This suggests an arrival in the region soon after the retreat of the glacial ice, either by way of Continental Europe and/or the Volga-Ural region. Haplogroup Z is found at low frequency in the Sami and Northern Asian populations but is virtually absent in Europe. Several conserved substitutions group the Sami Z lineages strongly with those from Finland and the Volga-Ural region of Russia, but distinguish them from Northeast Asian representatives. This suggests that some Sami lineages shared a common ancestor with lineages from the Volga-Ural region as recently as 2700 years ago, indicative of a more recent contribution of people from the Volga-Ural region to the Sami population.


From an earlier study, Tambets, Christina et al., “The Western and Eastern Roots of the Saami—the Story of Genetic “Outliers” Told by Mitochondrial DNA and Y Chromosomes” Am J Hum Genet. Apr 2004; 74(4): 661–682. Mar 11, 2004. doi:  10.1086/38320:

“In contrast to the predominance of European mtDNA haplogroups observed among the Saami, nearly half of their Y chromosomes share a TatC allele (haplogroup N3, according to the nomenclature of the Y Chromosome Consortium [YCC 2002]) with most Finno-Ugric and Siberian populations. This variant is found at high frequencies among Siberian populations, such as the Yakuts and the Buryats, but is virtually absent in western and Mediterranean Europe; even among the Norwegians and the Swedes, populations that have historically lived in close proximity to the Saami, it is found at frequencies of only 4%–8% (Zerjal et al. 2001; Passarino et al. 2002). High frequencies of the TatC allele have also been observed in Baltic (30%–40%) and Volga-Finnic–speaking populations (20%–50%) (Zerjal et al. 1997; Rootsi et al. 2000; Rosser et al. 2000; Semino et al. 2000; Laitinen et al. 2002). These findings have been interpreted according to the classic view that a substantial element of the Saami (and other European Finno-Ugric–speaking populations) genetic lineages originated in a recent migration from Asia (Zerjal et al. 1997, 2001).

Haplogroup N3, the most frequent haplogroup in the Saami population, is distributed in eastern European and northern Asian populations but it is rare or absent in western Europe (table 3). All analyzed Swedish Saami N3 lineages fall into subcluster N3a, defined by M178 (YCC 2002). Although N3a is widespread in Siberia, other haplogroups, characteristic of Samoyedic-speaking and other Siberian populations (such as C and Q), are either almost absent in Baltic-Finnic populations, including the Saami, or are only sporadic, as for haplogroup N2, which is found only among Volga region Finnic speakers (table 3).

In this respect, the Saami do not differ markedly from Finnic-speaking Karelians, Maris, Komis, Udmurts, or northern Russians, all of whom possess haplogroups of eastern Eurasian origin at similar frequencies (table 1). This minor part of the Saami mtDNA pool consists of two branches of the eastern Eurasian mtDNA tree—D5 and Z1. According to published data, the frequency of haplogroup D5 is relatively high in China (Yao et al. 2002). D5 is also present among Mongols and Siberians (Kolman et al. 1996; Derbeneva et al. 2002b). However, the Saami haplogroup D5 lineages, with the HVS-I motif 16126-16136-16360 and its derivatives (defined as “D5b” by Derenko et al. 2003), have been identified only in some northern and eastern European populations (among Karelians, Finns, Estonians, North-Russians, and Komis) and in some Siberian populations but not in Samoyeds (table 1). This suggests, again, the lack of gene flow from Samoyeds to eastern Europe.

Haplogroup Z, a subcluster of the M8 clade within the haplogroup M family of mtDNA (Kivisild et al. 2002), is found at highest frequencies in the northeastern Asians: the Itelmens and Koryaks (Schurr et al. 1999). It is also present in several Siberian populations, including the Altaic people (table 1). Though not identified in a large data set of the Yakuts (Fedorova et al. 2003; Pakendorf et al. 2003), haplogroup Z has been observed among several Finnic- and Turkic-speaking populations of the Volga-Ural region (Bermisheva et al. 2002). It is curious that it is more frequent there in Finnic- than in Turkic-speaking populations. The absence of haplogroup Z from most of the Siberian Uralic-speaking populations (Samoyedic-speaking Nenets and Selkups, as well as Siberian Ob-Ugric-speaking Khants and Mansis) (table 1) is therefore striking. We note that all haplogroup Z lineages that are found in eastern Europe belong to a subhaplogroup Z1, characterized by transitions at nps 151, 10325, and 16129 within the Z phylogeny (Kong et al. 2003a, 2003b). A matching HVS-I founder haplotype has been observed in the Koryak and the Itelmen populations (Schurr et al. 1999). The limited diversity of haplogroup Z in Europe suggests its relatively recent spread west of the Urals. …

The predominant Saami Y-chromosomal haplogroup N3 has a nearly uniform circumarctic distribution in Eurasia (table 3). The closely related N2 lineages are frequent in Siberian and Volga-Uralic populations. Thus, it is likely that haplogroup N variation represents a prehistoric link between the Siberian and eastern European/proto-Finnic populations via their paternal heritage. The improved resolution of the Y-chromosomal phylogenetic tree (Jobling and Tyler-Smith 2003) reveals an ancestral node shared by haplogroups N and O, with the latter restricted largely to eastern Asia. This connection is intriguing, but it is still unclear when and where this common ancestor first appeared. Nevertheless, one does not need to postulate a recent Siberian flow of Y chromosomes into the Saami gene pool to explain their high N3 frequency. First, such a flow from Samoyedic-speaking aboriginal Siberians to the Saami Y-chromosomal pool would predict the presence there of haplogroup N2 and/or haplogroup Q, widely spread in Samoyeds (Karafet et al. 2002). Second, the much higher diversity of N3 in eastern Europe than in Siberia (Villems et al. 1998; Rootsi et al. 2000) suggests that eastern Europe, rather than Siberia, is a possible origin of the earliest expansion of this haplogroup in northern Eurasia.”


The article Sami Prehistory Revisited: transactions, admixture and assimilation in the phylogeographic picture of Scandinavia by John Weinstock fleshes out details of the migratory history of the Saami with a more plausible hypothesis

“Haplogroup Z stems from Central Asia between the Caspian Sea and Lake Baikal. It has its highest frequency in Russia and among some Sami groups. The Sami Z lineage shares a common ancestor with groups in Finland and the Volga-Ural area of Russia and must be quite recent (2,700 BP) because it differs from Northeast Asian Z representatives (Ingman and Gyllensten: 115, 119).

Haplogroup D also arose in the Lake Baikal area and is the predominant maternal haplogroup in East Asia. It is an old lineage, some 60,000, and was one of the maternal haplogroups that found its way to the New World.” [“a small fraction of the Saami 菟ie that represents haplogroup D, and about 5% of the Saami haplogroup diversity is haplogroups D (D5) and Z, both of which are seen in Asian populations. I wouldn’t take the presence of the D5 haplogroup in the Saami as a sign of a recent connection between the Saami and Inuit populations though, as you can see that haplogroup D is quite common in central and eastern Asia, so that it seems most likely that a small fraction of the ancestors of the Saami came from central Asia, just as most of the ancestors of the Inuit likely did. In addition, the particular haplogroup D5 variants seen in the Saami are particular to Europe, which makes the connection between Europe and Asia via haplogroup D quite ancient.” – Are Lappland Saami and Alaskan Inuit of the same group?]

“Haplogroup Z originated in Siberia and spread from there in several directions. The spread westward was originally thought to have come to a halt around the Ural Mountains, but Ingman and Gyllensten found substantial frequencies of Z among the Finnish Sami and Southern Swedish traditional Sami suggesting that some Sami lineages shared a common ancestor with lineages from the Volga-Ural region as recently as 2,700 years ago (Op cit. 115, 119). Yet, the distribution of Z throughout Scandinavia seems to have implications beyond its presence in the Sami. Although the percentages of Z in the majority populations are quite small ranging from .3-.4% for Sweden (Lappalainen et al. 2008 and Tambets et al. 2004) to 2.5% for Finland (Lappalainen et al. 2008), this compares to no Z at all in Germany, Poland, the Balkans and all of Western Europe. Hence, the Finns, Norwegians and Swedes likely acquired Z through assimilation of the Sami and subsequent admixture. Looking at the numbers, a very crude estimate of the number of Swedes carrying Z mtDNA is 275,000, which is several orders of magnitude greater than the number of Sami in Sweden”


Haplogroup N first appeared in Southeast Asia 15,000-20,000 BP and today it is “mainly found in Northern Eurasia and is absent or only marginally present in other regions of the globe (Ibid. under Clade N).” The subclade N1c1 is especially frequent among Finns and Lithuanians. Siiri Rootsi et al. give a convergent time estimate for N (with data combined from the old designations N1-N3) of 19.4±4.8 (evolutionary time) and 5.8±1.4 (pedigree-based time) (2004: 135). Map 6 depicts the expansion of N. …


Haplogroup N seems to have arisen in Northern China/Mongolia from where it spread into Siberia and the Baltic. Its descendant N1c is widespread in the Baltic region and was brought by small groups of males speaking an early Uralic language. There is a clear distinction between the Eastern Finns and the Sami vs. those living further to the west with the former having high values of N1c. Österbotten in Western Finland, though, has a very high frequency too; this could be related to contacts with Norrland as above. North Norway has a higher value than most other majority groups, with the exception of Finland, and this may be due to admixture with the Sami. Three areas of Sweden show fairly high values with an east-west cline, 3.6% for Götaland to the south and 7.9% for Svealand just north of Götaland and Gotland off the east coast of Sweden with 10%. Surprisingly, perhaps, the figure for Norrland is only 6.5% whereas Västerbotten has 19%. Lappalainen et al. suggest historical ties to Finland where N1c is very common (2009: 70). Berit Myhre Dupuy et al. mention that the Y-chromosome N3 in Norway “is observed at 4% in the overall population and at 11% in the northern region corresponding to 150,000 and 50,000 inhabitants, respectively. These numbers exceed the total number of Saami inhabitants” (2005: 6). Dupuy et al. continue: “There is thus a considerable pool of Saami and/or Finnish [Kven] Y-chromosomes in the Norwegian population and particularly in the north (ibid. 6,8).” 4% of Norway’s population of ca. 5 million would be 190,000, the number of Norwegians carrying N1c.

Origin and spread of Haplogroup N ABR, Vol 1(1) June 2010

Another theory proposes that Haplogroup N arose in Africa, made it to Western Eurasia and spread from there.

 In Europe only 0.2 of the population belong to haplogroup N. The carriers of haplogroup N are mainly situated in Central Europe. In Africa haplogroup is found throughout the African continent. In Africa the populations carrying haplogroup N belong to almost all the language families spoken in Africa including Cushitic, Nilo-Saharan, Khoisan , Niger-Congo, and Semitic.

The craniometric and molecular evidence fails to support the hypothesis that haplogroup N entered Africa as a result of back migration. The presence of the N haplogroup among Sub-Saharan populations from the Nile basin, into West Africa, NorthEast Africa and East African corresponds to Ehret’s [18] hypothesis that the major contemporary African language families probably originated in one composite region extending from the Nile Basin to the Ethiopian highlands.
It appears that the Khoisan speakers took haplogroup N to western Eurasia. The molecular and craniofacial evidence makes it clear that the Aurignacians and many early farmers in the region were direct migrants to the Levant and Europe from Africa [3,8]. Moroever, the identification of Sub-Saharan craniometric features [3,4,11] and the N haplogroup among ancient skeletons [5,11] suggest that the ancient Sub-Saharans in the Levant and Europe already possessed haplogroup N ( when they arrived in these areas from Africa. This is supported by the fact that anatomically modern humans did not replace Neanderthal people in the Levant until after Cro-Magnon people has established the Aurignacian culture in Spain and France [13].
The craniometric evidence that the Aurignacians [3], Natufians [4,13] and other groups who inhabited the Levant and Europe belonged to Sub-Saharan populations at this time suggest that these farmers carried haplogroup N into western Eurasia between 40,000-7500 ybp and confirms Quintana-Murci et al [19] hypothesis that haplogroup N originated in Africa.

An analysis of Cro-Magnon DNAindicates that they belonged to haplogroup N [5].
The Cro-Magnon mtDNA is associated with the Paglicci-12 [5]. Paglicci-12 show the motifs , 00073G, 10873C, 10238T and AACC between nucleotide positions 10397 and 10400 [5]. This classifies the sequence into the macrohaplogroup N. According to Caramelli et al [5] a mutation in 16223 within HRV1 suggests a classification of Paglicci-12 in the haplogroup N*.
Caramelli et al [5] discovery of Cro-Magnon mtDNA confirms the research of Boule and Vallois [3]. This is confirmation of Boule and Vallois [3], because the Khoisan carry haplogroup N [12], the L3(N) haplogroup that was also found among the Cro-Magnon people.
Early farmers of the Levant and Europe show Sub-Saharan craniofacial features [4,13].There is other evidence of a predominantly Sub-Saharan population formerly existing in the Levant. Trenton W. Holliday [13], tested the hypothesis that if modern Africans had dispersed into the Levant from Africa, “tropically adapted hominids” would be represented in the archaeological history of the Levant, especially in relation to the Qafzeh-Skhul hominids. This researcher found that the Qafzeh-Skhul hominids (20,000-10,000 BC),were assigned to the Sub-Saharan population, along with the Natufians samples [13]. Holliday [13] also found African fauna in the area. Holliday [13] confirmed his hypothesis that the replacement of the Neanderthal people in the Levant were Sub-Saharan Africans.
This finding was similar to Brace et al[4] findings for the Levant and Europe .
The founders of civilization in Levant were the people, archaeologists call Natufians. Clark [6] claims that the Natufians originated in Africa.
By 13,000 BC, according to J.D. Clark [6] the Natufians were collecting grasses which later became domesticated crops in Levant and European. In Palestine the Natufians established intensive grass collection. The Natufians used the Ibero-Maurusian tool industry [14]. These Natufians , according to Christopher Ehret were small stature folk who spread agriculture throughout Nubia into the Red Sea (1979) and thence into North Africa and Europe. …These early European farmers fail to share haplogroups found among the contemporary Europeans. Ancient DNA found in the ancient skeletons belonging to this period belong to the N haplogroup. (Haak et al., 2005).
Haak et al., 2005 has discussed the identification of N1a in ancient Europe. These researchers report between 8 and 42% of the early farmers belonged to the N1a lineage. Today the percentage of Central Europeans who belong to N1a lineage is only 0.2%.
Haak found that the first Neolithic farmers did not have a strong genetic influence upon modern European female lineages. As noted above, the researchers found the early farmers carried haplogroup N1a. This is interesting because tBrace et al., found that the craniofacial features of these early farmers and those of the Natufians plotted with sub-Saharan groups, just like the Aurignacians. The presence of haplogroup N among Cro-magnon population and Neolithic farmers shows continuity genetically.”
The findings of Brace et al. [4] and Holiday [13] suggest that the Old Europeans may be related to African cattle raising farming groups. This supports the idea that ancient Eurasian farmers originally from Africa and the Middle East may have planted the seeds of agriculture in ancient Europe, since Boule and Vallois [3], and Brace et al. [4] have shown that the Aurignacians and the Natufians have a clear link to Sub-Sahara Africa.
Finally the Aurignacians did not come from the Levant. The archaeological evidence makes it clear that the Aurignacian culture appears fully developed in France and Spain [5]. The archaeological evidence also makes it clear that the Aurignacian culture moved from west to east [7,8,10]. As a result, the dates for the Near Eastern Aurignacian are later then the Aurignacian dates for Europe [20]. The spread of Aurignacian culture from France and Spain to Central Europe, suggest that there were two out of Africa exits one from the East, and another, later out of Africa event across the Straits of Gibraltar 40,000 ybp”

How do we reconcile the above theory that the origin of N lies in Africa, diversifying in Central Europe, with the study below that finds hg N’s origin is in East Asia? The following study casts some doubt on the antiquity of N1a in CAS

“By comparison, the age of N1a-M128 is strikingly young (3.75 kya), consistent with the observed star-like STR network suggesting a recent expansion of this lineage (Figure 3). Because the reported Central Asian population (Kazakhs) possessing relatively high frequency of N1a-M128 did not have enough STR data to calculate diversity, we were unable to infer the time of N1a-M128’s migration from East Asia into Central Asia.” Contrasted with N1b and N1c: “In order to date the major prehistoric population events along the northward and westward migration routes of the Hg N lineages, we used the STR data to calculate the STR variation ages of the 5 Hg N sub-haplogroups (Table 4). As expected, the ancestral lineage under LLY22g (N1*-LLY22g), the oldest among all N-M231 sub-haplogroups, was dated to 21.66 kya, falling in the Upper Paleolithic. The age of N1b-P43 was also very old (18.90 kya), indicating a relatively rapid northward migration during the Paleolithic period from southern China northward into Siberia. N1c-M46 was relatively young (11.70 kya). The age of N*-M231 (13.69 kya), presumably the ancestral lineage of Hg N, is younger than expected, likely as a result of yet-to-be-identified individuals having derived N-M231 sub-haplogroup when new Y SNP markers are uncovered in the future.”

Ages of hgs N haplotypes

Table 4. Estimated ages of Hg N and its sub-haplogroups.

Shi H, Qi X, Zhong H, Peng Y, Zhang X, et al. (2013) Genetic Evidence of an East Asian Origin and Paleolithic Northward Migration of Y-chromosome Haplogroup N. PLoS ONE 8(6): e66102. doi:10.1371/journal.pone.0066102

Abstract (pdf)

The Y-chromosome haplogroup N-M231 (Hg N) is distributed widely in eastern and central Asia, Siberia, as well as in eastern and northern Europe. Previous studies suggested a counterclockwise prehistoric migration of Hg N from eastern Asia to eastern and northern Europe. However, the root of this Y chromosome lineage and its detailed dispersal pattern across eastern Asia are still unclear. We analyzed haplogroup profiles and phylogeographic patterns of 1,570 Hg N individuals from 20,826 males in 359 populations across Eurasia. We first genotyped 6,371 males from 169 populations in China and Cambodia, and generated data of 360 Hg N individuals, and then combined published data on 1,210 Hg N individuals from Japanese, Southeast Asian, Siberian, European and Central Asian populations. The results showed that the sub-haplogroups of Hg N have a distinct geographical distribution. The highest Y-STR diversity of the ancestral Hg N sub-haplogroups was observed in the southern part of mainland East Asia, and further phylogeographic analyses supports an origin of Hg N in southern China. Combined with previous data, we propose that the early northward dispersal of Hg N started from southern China about 21 thousand years ago (kya), expanding into northern China 12–18 kya, and reaching further north to Siberia about 12–14 kya before a population expansion and westward migration into Central Asia and eastern/northern Europe around 8.0–10.0 kya. This northward migration of Hg N likewise coincides with retreating ice sheets after the Last Glacial Maximum (22–18 kya) in mainland East Asia.

In recent years, extensive studies of the Y-chromosome lineages in East Asian populations have been conducted and found that the dominant haplogroups O-M175, D-M174, C-M130, and N-M231 in East Asian populations all have a southern origin [1]–[8]. Among these East Asian Y-chromosome lineages, D-M174 represents the earliest northward migration, beginning from the southern part of East Asia of what is now mainland Southeast Asia and southern China about 50–60 kya [5]. The northward migration of C-M130 occurred about 40 kya, following coastal route up mainland China, then reaching further north to Siberia around 15 kya and finally making its way to northern America [8]–[11]. The northward expansion of O-M175 within the Asian continent (about 25–30 kya) made the greatest impact on current East Asian Y chromosomal profiles, reflected by the dominance of O-M175 lineages (ranging anywhere from 18–75%) in East Asia, and both mainland and island Southeast Asia [4].

By contrast, N-M231, as a sister-clade of O-M175, is relatively less prevalent in East Asian populations (averaging around 6%) (Table 1), but has a much wider geographic distribution across Eurasia as compared with the other Y-chromosome haplogroups [3], [7], [12]–[29]. Rootsi et al. (2007) proposed that the Hg N lineage dispersed from East Asia to northwestern Europe following a counter-clock-wise migratory route and speculated that the original homeland of Hg N likely traced to Southeast Asia, and had split with O-M175 about 34 kya. However, due to the limited populations studied for N-M231 from East Asia and Southeast Asia, Hg N’s putative center of origin and the chronology of dispersals remain inconclusive.

Hg N is prevalent (>5%) in East Asia (e.g., among Han Chinese, Tibeto-Burman and Austro-Asiatic speaking populations), as well as in northern/central Asia and eastern/northern Europe with on average the highest frequency in Siberia (38.27%). Meanwhile, Hg N is relatively rare in southeastern, southern and western Asia, and completely absent in southern/western Europe. Within the Hg N lineage, there are 5 sub-haplogroups with distinctive geographic distributions. N*-M231 is presumably the ancestral haplogroup in Hg N, mostly present in southern East Asian populations including Daic, southern Han Chinese, Tibeto-Burman and Hmong-Mien in southern China (Figure 2A); however, it is totally absent in Siberia, Central Asia and eastern/northern Europe, consistent with the previously proposed southern origin of Hg N in East Asia [3], [12], [16], [28], [29]. The other 4 sub-haplogroups share a common mutation at the LLY22g locus (Figure 3). Under LLY22g, N1*-LLY22g is both the ancestral and most dominant sub-haplogroup, with distribution extending from southern to northern East Asia and the highest frequency observed in Tibeto-Burman populations. The distribution pattern of N1a-M128 is similar to N1*-LLY22g, but much less prevalent (Figure 2B and 2C). By contrast, the distributions of N1b-P43 and N1c-M46 are restricted to North Asia and East/North Europe, rare in East Asia and Central Asia, and absent in Southeast and South Asia (Figure 2D and 2E). Collectively, this geographic distribution pattern suggests a clear divergence between regional populations with the ancestral lineages occurring in multiple ethnic populations throughout southern China.

We constructed contour maps of the five N-M231 sub-haplogroups based on the geographic distributions of these lineages in Eurasian populations (Table S3). The two presumably ancestral haplogroups (N*-M231 and N1*-LLY22g) likely originated in southern China, as there is a clear south-to-north decline of these frequencies (Figure 2A and 2B). Conversely, N1b-P43 and N1c-M46 are both enriched in Siberia with N1b-p43 having a north-to-south decline and N1c-M46 having an east-to-west decline (Figure 2D and 2E). The contour map of N1a-M128 is different from the others with the highest frequency observed in Central Asia due to the relatively high frequency of N1a-M128 among Kazakhs (8.1%) in Central Asia (Figure 2C).

To examine the detailed diversity of each N-M231 sub-haplogroup, we constructed STR networks for the 5 sub-haplogroups based on data of 7 Y-chromosome STR loci (Figure 3). Among the two ancestral lineages of Hg N, we observed relatively diverged STR haplotypes, and the core STR haplotypes are mostly from southern populations in China, suggesting a likely origin in southern China. Comparatively, the core STR haplotypes of N1b-P43 are mostly from the northern populations of China and Siberia, suggesting its origin may be in northern East Asia. Moreover, the STR networks of N1b-P43 reflect that the STR haplotypes in Europeans were derived from Siberia and Central Asia, consistent with the proposed counter-clock-wise prehistoric migration of the Hg N lineages into East/North Europe [3]. Interestingly, N1a-M128 displayed a star-like STR network, implying a recent expansion of this Hg N lineage. Although N1a-M128 has the highest frequency in Central Asia [3], considering its presence (though low frequency) in multiple ethnic populations throughout southern China, N1a-M128 is unlikely to have a Central Asia origin. Instead, N1a-M128 may similarly have its origin in East Asia, reflected by the STR network showing an East Asia core haplotype (Figure 3). The high frequency of N1a-M128 in Central Asia is likely then due to a recent local expansion of this sub-haplogroup.

Further comparison of the STR variation levels among the different populations also supports an East Asia origin of the Hg N. For the two ancestral lineages, N*-M231 and N1*-LYY22g, the STR diversity of southern populations is higher than northern populations in East Asia (Table 3). We observed similar patterns for the other three sub-haplogroups, which expanded outside of East Asia and into Siberia, Central Asia and East/North Europe (Table 3)


Hg N is the most widely distributed Y chromosome haplogroup in Eurasia (Table 1). By extending the population coverage into East Asia, we showed that Hg N is present in most East Asian populations, though the frequencies are low (Table 1 and Table S1). Previously, Hg N was speculated to have originated in Southeast Asia, and consequently split with its sister haplogroup O-M122 about 34 kya and then migrated northward to mainland East Asia during late Pleistocene-Holocene [3]. However, we demonstrated that Hg N is in fact extremely rare in Southeast Asia populations. For example, in our analysis of 293 multi-ethnic Cambodian males, we only detected one Hg N individual (0.34%), contrasting the previous report of a much higher frequency of one in six males (16.67%) in Cambodia, which was likely caused by a small sample size. Hg N is also rare in other Southeast Asia populations (<1.5%), including those in Laos, Vietnam, Thailand, Indonesia, Malaysia and the Philippines (Table 1), thereby suggesting that Southeast Asia may not be the homeland of Hg N. Instead, the southern part of mainland East Asia (presumably southern China) is more likely the putative origin for Hg N, as reflected by the distribution of ancestral Hg N lineages (N*-M231 and N1*-LLY22g) and the observed higher STR diversity of multiple southern ethnic populations in China (Table 3). The STR network analysis and contour map further support a southern East Asia origin of Hg N.

As proposed previously, the initial prehistoric migration of Hg N began in the south and moved south to north, starting in southern China. We are now able to draw a relatively more detailed migratory picture for Hg N lineage by estimating the ages of the Hg N haplotypes using STR variations. The initial northward migration probably started around 21 kya, reflected by the age of N1*-LLY22g (21.66 kya), the most prevalent N-M231 sub-haplogroup in East Asia. Along the path of northward migration in mainland China, two other N-M231 sub-haplogroups occurred at about 12–18 kya, later becoming the dominant Y-chromosome lineages in Siberian populations as a result of local population expansion. Previously N1b-P43 and N1c-M46 were proposed to have experienced serial bottleneck events in northern East Asia and then dispersed into Siberia, Central Asia and Europe [3]. As the age difference between N1b-P43/N1c-M46 and N1*-LLY22g is comparatively small (3–5 kya), we can infer that the prehistoric migration of Hg N was relatively quick, coinciding with the end of the Last Glacial Maximum (LGM) in East Asia (22–18 kya). The postglacial migration of modern humans in East Asia can likewise be reflected by the northward migration of the C-M130 haplogroup along the coastline of mainland China, before moving further north to Siberia around 15 kya [8]–[11].

With the application of next generation sequencing on the Y chromosome, more Y-SNPs will be discovered, which can help increase the resolution of the Hg N haplogroup tee and provide more detailed phylogeographic information about the origin and prehistoric migration of this important Eurasian Y chromosome lineage.


Based on the dating of the Hg N haplotypes and their geographic distributions paired with the suggested counter-clock-wise migratory route across Eurasia [3], we proposed a migratory map (Figure 4) of the Hg N lineages beginning in southern China about 21 kya, and expanding into northern China 12–18 kya, reaching further north to Siberia about 12–14 kya [3], and followed by a population expansion and westward migration into Central Asia and East/North Europe around 8.0–10.0 kya [16]

Figure 4. Proposed prehistoric migration routes for Hg N lineage. The shaded areas represent the haplogroup N distributions

Figure 4. Proposed prehistoric migration routes for Hg N lineage.
The shaded areas represent the haplogroup N distributions

For an understanding of how hg N expanded, see FIGURE 2: GContour maps of Hg N sub-haplogroups.
A, N*-M231, B, N1*-LLY22g, C, N1a-M128, D, N1b-P43, E, N1c-M46 (Tat). (The regional populations used is listed in Table S3).

GContour maps of Hg N sub-haplogroups

GContour maps of Hg N sub-haplogroups

The puzzling questions I have here, are, if the first/early European Neolithic farmers N1a were the oldest lineages (i.e., if they were the oldest lineages of N), when and how and by which route did hg N make it to Southeast Asia before making the anti-clockwise route back to Central Asia and back to Europe?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s