Gene Expression, Razib Khan on June 4, 2009
“In the 19th & 20th centuries with the emergence of nationalism and its various scholarly subsidiaries in archaeology, philology and ethnography quite a few pan-ethnic movements rooted in language were born. Pan-Slavism, the Greater German idea (Grossdeutsch) and Pan-Arabism come to mind. As evident in their names these ideas shadowed relationships of language, but they often veered into racialist territory. In The History and Geography of Human Genes L. L. Cavalli-Sforza reported a substantial overlap between phylogenies generated from classical autosomal markers, and those of linguistic family trees. But obviously there were deviations on the margins, sometimes substantial ones.
Pan-Turkism is an idea which came to the fore after the collapse of the Ottoman Empire, though its roots go back to the 19th century. The role of a Turkish nationality was essential in the creation of the modern Turkish state by Kemal Ataturk. Though the Ottoman Empire had a Turkish speaking rule class at its heart, it was not fundamentally an ethnic polity. The Ottoman Empire was far less Turkish than the Hapsburg Empire was German. The Ottoman bureaucracy and military was open to many ethnicities, though often conditional on conversion to Islam. Albanians and Slavs played an important role in the Ottoman military, while the Janissaries were famously recruited from Christian subject peoples.
Ataturk aimed to replace this ethnically cosmopolitan but religiously Islamic Ottoman identity with a Turkish secular one. To a great extent he was successful, though not fully. Because of the vogue for racial theories in the early 20th century the new Turkish government naturally did fund research which purported to illustrate the distinctions between the Turkish peoples who had settled Anatolia and Southeastern Europe after the year 1000, and the native Greek, Slavic and Armenian populations. There is of course a natural problem with this: the basic origin of the Turkic peoples in Western Mongolia and the trans-Siberian steppe is well known, and Turkic speaking peoples still reside in that region, and physically they do not look much like Anatolian Turks at all. In fact there is a clear cline of Mongolian to European appearance from Central Asia to Anatolia. Of course common sense is often too easy to brush aside, and Pan-Turkish theories still seem to presuppose ideas of common ancestry.
This is where genetics come in. There have been several recent papers on attempting to adduce the relationship of Anatolian Turks to peoples of the Balkans and Central Asia, but generally they have utilized uniparental markers, mtDNA and Y. Alu insertion polymorphisms and an assessment of the genetic contribution of Central Asia to Anatolia with respect to the Balkans uses 10 alu elements to estimate admixture between these putative parental populations:
In the evolutionary history of modern humans, Anatolia acted as a bridge between the Caucasus, the Near East, and Europe. Because of its geographical location, Anatolia was subject to migrations from multiple different regions throughout time. The last, well-known migration was the movement of Turkic speaking, nomadic groups from Central Asia. They invaded Anatolia and then the language of the region was gradually replaced by the Turkic language. In the present study, insertion frequencies of 10 Alu loci…have been determined in the Anatolian population. Together with the data compiled from other databases, the similarity of the Anatolian population to that of the Balkans and Central Asia has been visualized by multidimensional scaling method. Analysis suggested that, genetically, Anatolia is more closely related with the Balkan populations than to the Central Asian populations. Central Asian contribution to Anatolia with respect to the Balkans was quantified with an admixture analysis. Furthermore, the association between the Central Asian contribution and the language replacement episode was examined by comparative analysis of the Central Asian contribution to Anatolia, Azerbaijan (another Turkic speaking country) and their neighbors. In the present study, the Central Asian contribution to Anatolia was estimated as 13%. This was the lowest value among the populations analyzed. This observation may be explained by Anatolia having the lowest migrant/resident ratio at the time of migrations.
Figure 2 illustrates the conclusion starkly:
As noted in the abstract it is important to remember that Anatolia has among the longest histories of settled agriculturalists in the world. Population estimates suggest nearly 12 million residents during the late Roman Empire. Though I am skeptical that the population was nearly so high during the early medieval period, even if it was 1 million that would be substantial. There is an asymmetry between the two source populations as farmers tend to greatly outnumber nomads.
Also remember that the positions of the Central Asian groups are likely closer to the Anatolian Turks than would be from Turkic populations closest to the ancestral homeland. The Turkish expansion occurred late in history, after the fall of the Roman Empire, but before the rise of Islam. Groups like the Huns and Avars who ravaged Central Europe during Late Antiquity were likely Turks, or had Turkic speaking peoples as part of their hordes. The famous Khazar Jews were also Turks. Turks took refuge among the Magyars after the expansion of the Mongol hordes. It is the last event which obscures the Mongolian origin of the Turks. The Mongols were a minor tribe on the northeastern fringes of what is today Mongolia before the rise of Genghis Khan. Western Mongolia was dominated by Turkic groups, and this was the likely point of departure for many of the earlier expansions. The Uyghurs, East Turks south of the original homeland are themselves highly admixed with a “Western” element which was indigenous to the region prior to the Turkish migrations.”
Turks with 10 – 25% Mongoloid admixture
The Turks in World History by Carter Vaughn Findley
1,200 year old Scythian warrior in Ulan Bator, Altai Mongolia
Jordana,X.,et al.,The warriors of the steppes: osteological evidence of warfare and violence from Pazyryk…, J.Archaeol. Sci. (2009), doi:10.1016/j.jas.2009.01.008
PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 1 / 24 OPEN ACCESS Citation: Yunusbayev B, Metspalu M, Metspalu E, Valeev A, Litvinov S, Valiev R, et al. (2015) The Genetic Legacy of the Expansion of Turkic-Speaking Nomads across Eurasia.
PLoS Genet 11(4): e1005068. doi:10.1371/journal.pgen.1005068 Received: November 28, 2013 Accepted: February 11, 2015 Published: April 21, 2015 Copyright: © 2015
Yunusbayev et al.
Abstract The Turkic peoples represent a diverse collection of ethnic groups defined by the Turkic languages. These groups have dispersed across a vast area, including Siberia, Northwest China, Central Asia, East Europe, the Caucasus, Anatolia, the Middle East, and Afghanistan. The origin and early dispersal history of the Turkic peoples is disputed, with candidates for their ancient homeland ranging from the Transcaspian steppe to Manchuria in Northeast Asia. Previous genetic studies have not identified a clear-cut unifying genetic signal for the Turkic peoples, which lends support for language replacement rather than demic diffusion as the model for the Turkic language’s expansion. We addressed the genetic origin of 373 individuals from 22 Turkic-speaking populations, representing their current geographic range, by analyzing genome-wide high-density genotype data. In agreement with the elite dominance model of language expansion most of the Turkic peoples studied genetically resemble their geographic neighbors. However, western Turkic peoples sampled across West Eurasia shared an excess of long chromosomal tracts that are identical by descent (IBD) with populations from present-day South Siberia and Mongolia (SSM), an area
This is an open access article distributed under the terms of the Creative Commons Attribution License
While SSM matching IBD tracts (> 1cM) are also observed in non-Turkic populations, Turkic peoples demonstrate a higher percentage of such tracts (p-values 0.01) compared to their nonTurkic neighbors. Finally, we used the ALDER method and inferred admixture dates (~9th– 17th centuries) that overlap with the Turkic migrations of the 5th–16th centuries. Thus, our results indicate historical admixture among Turkic peoples, and the recent shared ancestry with modern populations in SSM supports one of the hypothesized homelands for their nomadic Turkic and related Mongolic ancestors. Author Summary Centuries of nomadic migrations have ultimately resulted in the distribution of Turkic languages over a large area ranging from Siberia, across Central Asia to Eastern Europe and the Middle East. Despite the profound cultural impact left by these nomadic peoples, little is known about their prehistoric origins. Moreover, because contemporary Turkic speakers tend to genetically resemble their geographic neighbors, it is not clear whether their nomadic ancestors left an identifiable genetic trace. In this study, we show that Turkic-speaking peoples sampled across the Middle East, Caucasus, East Europe, and Central Asia share varying proportions of Asian ancestry that originate in a single area, southern Siberia and Mongolia. Mongolic- and Turkic-speaking populations from this area bear an unusually high number of long chromosomal tracts that are identical by descent with Turkic peoples from across west Eurasia. Admixture induced linkage disequilibrium decay across chromosomes in these populations indicates that admixture occurred during the 9th–17th centuries, in agreement with the historically recorded Turkic nomadic migrations and later Mongol expansion. Thus, our findings reveal genetic traces of recent large-scale nomadic migrations and map their source to a previously hypothesized area of Mongolia and southern Siberia. Introduction Linguistic relatedness is frequently used to inform genetic studies  and here we take this path to reconstruct aspects of a major and relatively recent demographic event, the expansion of nomadic Turkic-speaking peoples, who reshaped much of the West Eurasian ethno-linguistic landscape in the last two millennia. Modern Turkic-speaking populations are a largely settled people; they number over 170 million across Eurasia and, following a period of migrations spanning the ~5th–16th centuries, have a wide geographic dispersal, encompassing Eastern Europe, Middle East, Northern Caucasus, Central Asia, Southern Siberia, Northern China, and Northeastern Siberia [2–4]. The extant variety of Turkic languages spoken over this vast geographic span reflects only the recent (2100–2300 years) history of divergence, which includes a major split into Oghur (or Bolgar) and Common Turkic [5, 6]. This period was preceded by early Ancient Turkic, for which there is no historical data, and a long-lasting proto-Turkic stage, provided there was a Turkic-Mongolian linguistic unity (protolanguage) around 4500–4000 BCE [7, 8]. The earliest Turkic ruled polities (between the 6th and 9th centuries) were centered in what is now Mongolia, northern China, and southern Siberia. Accordingly, this region has been put forward as the point of origin for the dispersal of Turkic-speaking pastoral nomads [3, 4]. We
The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015
The authors designate it here as an “Inner Asian Homeland” (IAH) and note at least two issues with this working hypothesis. First, the same approximate area was earlier dominated by the Xiongnu Empire (Hsiung-nu) (200 BCE–100 CE) and later by the short-lived Xianbei (Hsien-pi) Confederation (100–200 CE) and Rouran State (aka Juan-juan or Asian Avar) (400–500 CE). These steppe polities were likely established by non-Turkic-speaking peoples and presumably united ethnically diverse tribes. It is only in the second half of the 6th century that Turkic-speaking peoples gained control of the region and formed the rapidly expanding Göktürk Khaganate, succeeded soon by numerous khanates and khaganates extending from northeastern China to the Pontic-Caspian steppes in Europe [2–4]. Secondly, Göktürks represent the earliest known ethnic unit whereby Turkic peoples appear under the name Turk. Yet, Turkic-speaking peoples appear in written historical sources before that time, namely when Oghuric Turkic-speaking tribes appear in the Northern Pontic steppes in the 5th century, much earlier than the rise of Göktürk Khaganate in the IAH. Thus, the early stages of Turkic dispersal remain poorly understood and our knowledge about their ancient habitat remains a working hypothesis. Previous studies based on Y chromosome, mitochondrial DNA (mtDNA), and autosomal markers show that while the Turkic peoples from West Asia (Anatolian Turks and Azeris) and Eastern Europe (Gagauzes, Tatars, Chuvashes, and Bashkirs) are generally genetically similar to their geographic neighbors, they do display a minor share of both mtDNA and Y haplogroups otherwise characteristic of East Asia [10–15]. Expectedly, the Central Asian Turkic speakers (Kyrgyz, Kazakhs, Uzbeks, and Turkmens), share more of their uniparental gene pool (9–76% of Y chromosome and over 30% of mtDNA lineages) with East Asian and Siberian populations [16, 17]. In this regard, they differ from their southern non-Turkic neighbors, including Tajiks, Iranians, and different ethnic groups in Pakistan, except Hazara. However, these studies do not aim to identify the precise geographic source and the time of arrival or admixture of the East Eurasian genes among the contemporary Turkic-speaking peoples. The “eastern” mtDNA and even more so Y-chromosome lineages (given the resolution available to the studies at the time) lack the geographic specificity to explicitly distinguish between regions within Northeast Asia and Siberia, and/or Turkic and non-Turkic speakers of the region [18, 19]. Several studies using genome-wide SNP panel data describe the genetic structure of populations in Eurasia and although some include different Turkic populations [15, 20–24], they do not focus on elucidating the demographic past of the Turkic-speaking continuum. In cases where more than one geographic neighbor is available for comparison, Turkic-speaking peoples are genetically close to their non-Turkic geographic neighbors in Anatolia [22, 25], the Caucasus , and Siberia [21, 23]. A recent survey of worldwide populations revealed a recent (13th–14th century) admixture signal among the three Turkic populations (Turks, Uzbeks, and Uygurs) and one non-Turkic population (Lezgins) with Mongolas (from northern China), the Daurs (speaking Mongolic language), and Hazaras (of Mongol origin) . This study also showed evidence for admixture (dating to the pre-Mongol period of 440–1080 CE) among non-Turkic (except Chuvashes) East European and Balkan populations with the source group related to modern Oroqens, Mongolas, and Yakuts. This is the first genetic evidence of historical gene flow from a North Chinese and Siberian source into some north and central Eurasian populations, but it is not clear whether this admixture signal applies to other Turkic populations across West Eurasia. Here we ask whether it is possible to identify explicit genetic signal(s) shared by all Turkic peoples that have likely descended from putative prehistoric nomadic Turks. Specifically, we test whether different Turkic peoples share genetic heritage that can be traced back to the hypothesized IAH. More specifically, we ask whether this shared ancestry occurred within an historical time frame, testified by an excess of long chromosomal tracts identical by descent The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 3 / 24 between Turkic-speaking peoples across West Eurasia and those inhabiting the IAH. To address these questions we used a genome-wide high-density genotyping array to generate data on Turkic-speaking peoples representing all major branches of the language family (Fig 1B). Fig 1. Geographic map of samples included in this study and linguistic tree of Turkic languages. Panel A) Non-Turkic-speaking populations are shown with light blue, light green, dark green, light brown, and yellow circles, depending on the region. Turkic-speaking populations are shown with red circles regardless of the region of sampling. Full population names are given in S1 Table Panel B) The linguistic tree of Turkic languages is adapted from Dybo 2004 and includes only those languages spoken by the Turkic peoples analyzed in this study. The x-axis shows the time scale in kilo-years (kya). Internal branches are shown with different colors. doi:10.1371/journal.pgen.1005068.g001 The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 4 / 24 Results To characterize the population structure of Turkic-speaking populations in the context of their geographic neighbors across Eurasia, we genotyped 322 new samples from 38 Eurasian populations and combined it with previously published data (see S1 Table and Material and Methods for details) to yield a total dataset of 1,444 samples genotyped at 515,841 markers. The novel samples introduced in this study geographically cover previously underrepresented regions like Eastern Europe (Volga-Ural region), Central Asia, Siberia, and the Middle East. We used a STRUCTURE-like  approach implemented in the program ADMIXTURE  to explore the genetic structure in the Eurasian populations by inferring the most likely number of genetic clusters and mixing proportions consistent with the observed genotype data (from K = 3 through K = 14 groups) (S1 Fig). As shown in previous studies [15, 20, 29] East Asian populations commonly contained alleles that find membership in two general clusters, shown here as k6 and k8, in a model assuming K = 8 “ancestral” populations (Fig 2). Geographically, the spread zones of these two components (clusters) were centered on Siberia and East Asia, respectively. Their combined prevalence declined as one moves west from East Asia (correlation with longitude, p = 8.8×10−16, R = 0.77, 95% CI: 0.66–0.85). Overall, alleles from the Turkic populations sampled across West Eurasia showed membership in the same set of West Eurasian genetic clusters, k1–k4, as did their geographic neighbors. In addition, the Volga-Uralic Turkic peoples (Chuvashes, Tatars, and Bashkirs) also displayed membership in the k5 cluster, which contained the Siberian Uralic-speaking populations (Nganasans and Nenets) and extended to some of the European Uralic speakers (Maris, Udmurts, and Komis). However, in most cases the Turkic peoples showed a higher combined presence of the “eastern components” k6 and k8 than did their geographic neighbors. Fig 2. Population structure inferred using ADMIXTURE analysis. ADMIXTURE results at K = 8 are shown. Each individual is represented by a vertical (100%) stacked column indicating the proportions of ancestry in K constructed ancestral populations. Turkic-speaking populations are shown in red. The upper barplot shows only Turkic-speaking populations. doi:10.1371/journal.pgen.1005068.g002 The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 5 / 24 Three-population test The “eastern components” k6 and k8 inferred among Turkic- and non-Turkic peoples across West Eurasia, as well as the “western components” k1, k2, and k3 present among Siberian populations can originate through gene flow episodes in opposite directions in the past and this population mixture history can be statistically tested using f3-statistics [30, 31]. In order to evaluate the admixture scenarios suggested by the ADMIXTURE analysis, we tested all possible three population combinations in our dataset using the three-population test (f3-statistics) [30, 31]. We reported only population trios f3(target, source1, source2) with the most negative f3- statistics (S2 Table) and considered populations to be significantly admixed when their Z-score was smaller than 1.64 (i.e. p-value was less than 0.05, for a one-tailed test). Our three-population tests showed that almost all the West Eurasian Turkic peoples (15 out of 16) and their non-Turkic neighbors (49 out of 61) (see S2 Table for geographic subdivision) were admixed with East Asian- and Siberian-related populations. Similarly, all the Siberian Turkic populations, as well as some (11 out of 27) East Eurasian non-Turkic populations showed an admixture signal with West Eurasian-related populations. In interpreting f3-statistics results, it is important to point out that the reported source populations do not necessarily represent the true admixing populations . Although the exact source populations were uncertain, significantly negative f3-statistics provided strong evidence for admixture in most of the Turkic and non-Turkic populations in our dataset. In order to test whether these admixture signals resulted from recent gene flow events, we next explored the distribution of long chromosomal tracts shared between populations in our dataset. Geographic distribution of recent shared ancestry A recent study shows that even a pair of unrelated individuals from the opposite ends of Europe share hundreds of chromosomal tracts of IBD from common ancestors that lived over the past 3,000 years. The amount of such recent ancestry declines exponentially with geographic distance between population pairs, and such a distance-dependent pattern can be distorted due to population expansion or gene flow . We observed a reasonably high correlation (Pearson’s correlation coefficient = 0.77, 95% CI: 0.76–0.79, p < 2.2×10–16) between the rate of IBD sharing decay and geographic distance in our set of Eurasian populations. This distance-dependent pattern is likely shaped by both isolation-by-distance and gene flow: many of the populations are admixed (the negative f3-statistics in S2 Table) and there is a longitude dependent decrease in the prevalence of “eastern components” k6 and k8. Some populations might stand out in this distance dependent pattern due to isolation, greater gene flow, or genetic drift. For example, when we removed the West Eurasian Turkic populations (sampled in the Middle East, Caucasus, Eastern Europe, and Central Asia) from our dataset, we observed better correlation between IBD sharing decay and geographic distances between populations (Pearson’s correlation coefficient = 0.83, 95% CI: 0.82–0.85, p < 2.2×10−16). To identify populations for which IBD sharing with Turkic populations departs from a distance-dependent decay pattern, we first computed IBD sharing (the average length of genome IBD measured in centiMorgans) for each of the 12 western Turkic populations with all other populations in the dataset (S3 Table) and then subtracted the same statistic computed for their geographic neighbors (see the Materials and Methods section for details and S2 Fig for a schematic representation of this analysis). When the differences were overlaid for all 12 Turkic populations, we detected an unusually high signal of accumulated IBD sharing (samples indicated by a “plus symbol” on Fig 3A–3C) for populations outside West Eurasia. The correlated signal of IBD sharing for these distant populations exceeded the expectation based on a distance-dependent decay pattern. Most of these distant populations are located in South Siberia and Mongolia (SSM) and The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 6 / 24 Northeast Siberia, except the two samples in Eastern Europe (Maris) and the North Caucasus (Kalmyks). In principle, when we compare the IBD sharing pattern in this way between neighboring Turkic and non-Turkic populations, we might observe a high IBD sharing signal with some Siberian populations due to drift in one of the populations compared, but chances that such random signals would correlate between multiple Turkic populations and accumulate in a single region is negligible. Indeed the null hypothesis for this analysis assumed no systematic difference between any of the Turkic populations and their respective geographic neighbors. Therefore, the null hypothesis predicted that random differences accumulated across the entire geographic range of the western Turkic populations. To demonstrate this null expectation, we replaced each of the western Turkic populations by populations randomly drawn from the sets of respective non-Turkic neighbors, and repeated this subtraction/accumulation analysis, as shown in S2 Fig When the sets of random non-Turkic samples were tested, the accumulated signal was restricted to populations (indicated by the “plus symbol” on S3 Fig) within West Eurasia, as expected by the null hypothesis. There are, however, two exceptions (Nganasans Fig 3. Populations with high and correlated signals of IBD sharing with western Turkic peoples. Circle positions correspond to population locations. Circle color indicates the amount of excess IBD sharing (shown in Legend) that a population shares with all 12 western Turkic populations. Populations with IBD sharing exceeding the 0.90 quantile are shown with a “plus symbol”. Panel A) IBD sharing signal based on IBD tracts of 1–2 cM. Panel B) IBD sharing signal based on IBD tracts of 2–3 cM. Panel C) IBD sharing signal based on IBD tracts of 3–4 cM doi:10.1371/journal.pgen.1005068.g003 The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 7 / 24 and Nenets) that, when examined closely, suggest an interesting finding consistent with our ADMIXTURE results. These two Siberian populations, Nganasans and Nenets (S3A, S3B, S3E, S3I and S3J Fig), speak Uralic languages and demonstrated a high accumulated signal only when our tested sets contained the western Uralic speakers (Maris, Komis, Vepsas, and Udmurts). This was in line with our ADMIXTURE results (Fig 2), as the k5 ancestry component was shared specifically between these western Uralic speakers and the two Siberian Uralic-speaking Nganasans and Nenets. We now return to the overall difference between the accumulated IBD sharing signal under the null hypothesis (see S3 Fig) and that observed for the set of western Turkic populations (Fig 3). Some of the populations in SSM and Northeast Siberia demonstrated a strong IBD sharing signal with the western Turkic populations and this pattern most likely indicates recent gene flow from Siberia. To narrow down the source of this gene flow it is important to know which of the Siberian populations are indigenous to their current locations. We show in the Discussion section that only Tuvans, Buryats, and Mongols from the SSM area are indigenous to their current locations (at least within the known historical time) and therefore this area is the best candidate for the source of recent gene flow into the western Turkic populations. It should be noted that this east-west directionality is implied by the fact that 12 populations sampled across different West Eurasian locations are unlikely to show a correlated signal of high IBD sharing with a single region unless they received gene flow from it. Indeed, when we repeated our analysis by randomly choosing non-Turkic populations (S2 Fig), we could not reproduce a similar correlated signal. Our previous analysis suggests that the western Turkic populations (Fig 3 and S3 Table) experienced stronger gene flow from the SSM area than their non-Turkic neighbors, but it is not clear whether this signal is statistically significant. To test this, we computed IBD sharing between the group of SSM populations (Tuvans, Mongols, and Buryats, as well as a known migrant population, Evenkis) and each of the western Turkic populations. Then, for each of the western Turkic populations, we pooled their non-Turkic neighbors, and generated 10,000 permuted samples to see whether a comparable amount of IBD sharing (observed in tested Turkic populations) with the four Siberian populations is obtained by chance. IBD sharing was estimated separately for different classes of chromosomal tracts (1–2 cM, 2–3 cM, 3–4 cM, etc.), and permutation tests were performed. In most of the cases, higher IBD sharing between the western Turkic populations (compared to non-Turkic neighbors) and the Siberian populations was statistically significant (Fig 4 and S4 Fig; numbers in red show how many Siberian populations have p-values 0.01). Some of the non-Turkic neighbors, such as Adyghe, Maris, Udmurts, and North Ossetians, also shared a relatively high number of IBD tracts (Fig 4 and S4 Fig) with the SSM populations. We conclude that the recent gene flow from the SSM area inferred in our previous analysis was not restricted to the western Turkic peoples, and the higher IBD sharing is evidence that Turkic populations are distinct from their non-Turkic neighbors. A spatial pattern in IBD sharing was noted when IBD tracts of different length classes were considered separately. For segment classes of 1–2 cM and 2–3 cM, higher IBD sharing is statistically significant for most Turkic speakers, except Gagauzes and Chuvashes (and Tatars in the case of 2–3 cM). For longer IBD tracts of 3–4 cM, statistical evidence for higher IBD sharing becomes weaker in some Middle Eastern and Caucasus (Azeris, Kumyks, and Balkars) samples. By weaker evidence, we mean that a statistically significant excess of IBD sharing was restricted to a subset of the four candidate ancestors tested. In the Volga-Ural region, for the same class of segments (3–4 cM), only Bashkirs continued to show strong evidence for gene flow, while Tatars and Chuvashes do not. For these two Turkic populations, not all tests were statistically significant because the background group, from which permuted samples are drawn, contained the Finnic speaking Mari population, which shows comparable levels of Asian admixture (Fig 2) and IBD sharing (S4 Fig). When we considered even longer segments (4–5 cM and 5–6 cM), The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 8 / 24 we no longer observed a systematic excess of IBD sharing for Turkic peoples in the Middle East, the Caucasus, or in the Volga-Ural region. In contrast, populations closer to the SSM area (Uzbeks, Kazakhs, Kyrgyz, and Uygurs, and also Bashkirs from the Volga-Ural region) still demonstrated a statistically significant excess of IBD sharing. This spatial pattern can be partly explained by a relative rarity of longer IBD tracts compared to shorter ones and recurrent gene flow events into populations closer to the SSM area. Dating the age of Asian admixture using the ALDER and SPCO methods According to historical records, the Turkic migrations took place largely during ~5th–16th centuries (little is known about earlier periods) and partly overlap with the Mongol expansion. Fig 4. Pairwise IBD sharing based on 1–2 cM long segments. For each population ordered along the x– axis, IBD sharing is computed with three SSM populations (Tuvans, Buryats, Mongols) and Evenkis. Each Turkic-speaking population (shown in red) is grouped with its respective geographic neighbors using parentheses. The grouped geographic neighbors were pooled and used to perform a permutation test as described in the M&M section. Red numbers under the Turkic population name indicate how many SSM populations demonstrate a statistically significant excess of IBD sharing with a given Turkic population. Note that, for example, Bashkirs, Tatars, and Chuvashes share their geographic neighbors. doi:10.1371/journal.pgen.1005068.g004 The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 9 / 24 Assuming 30 years per generation, the common Siberian ancestors of various Turkic peoples lived prior to and during this migration period between 20 and 53 generations ago. The expected length of a single-path IBD tract inherited from a common ancestor that lived ~20–53 generations ago ranges between 2.5 cM and 0.94 centiMorgans (see Methods for details). Taking into account that multi-path IBD tracts will be on average longer , the IBD sharing signal at 1–5 cM detected between the western Turkic peoples and the SSM area populations may be due to historical Turkic and Mongolic expansions from the SSM area. It is possible to approximately outline the age of common ancestors directly from the distribution of shared IBD tracts , but such an inference would be too coarse for our purposes. Here we use two different methods implemented in ALDER  and SPCO  to infer the age of Siberian/Asian admixture among Turkic peoples. The admixture dates for all the analyzed Turkic peoples (Fig 5 and S4 Table) fell within the historical time frame (5th–17th century) that overlaps with the period of nomadic migrations triggered by Turkic (6th–16th centuries CE) and Mongol expansions (13th century) [2, 3]. However, individual admixture dates estimated using the two Fig 5. Admixture dates for Turkic-speaking populations on an absolute date scale. Blue circles show ALDER-inferred point estimates and error bars indicate 95% confidence intervals. Gray circles show SPCOinferred point estimates and error bars in gray indicate 95% confidence intervals. The red bar shows the point estimate range (inferred using ALDER) across all the analyzed samples and the orange bar shows the same for SPCO-inferred dates. Admixture dates before Common Era (CE) are shown with a negative sign. doi:10.1371/journal.pgen.1005068.g005 The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 10 / 24 methods overlap only partially and were discordant for most populations (Fig 5). Therefore, we simulated a series of admixture events spanning a target historical period and compared how the two methods performed (see Material and Methods for details). The dates inferred by ALDER tended to be closer to simulated true values, while SPCO consistently estimated slightly older dates (Fig 6). Importantly, the SPCO-inferred dates for our real dataset (Fig 5) also tended to be older, and we therefore suspect bias in our SPCO estimates. From here onward we discuss only ALDER-inferred dates. The admixture-induced linkage disequilibrium signal in an admixed population is“restored” using two surrogates (reference populations) and when several such pairs are tested, ALDER allows the comparison of their genetic proximity to true mixing populations based on the amplitude of the weighted LD curve . It should be noted, however, that inferences made in this way might be misleading when the reference population genetically related to the true mixing population underwent admixture itself. In this case, another less related population would be inferred as a better surrogate for the true mixing population. We considered all possible combinations of Fig 6. Admixture dates for simulated populations. Simulated populations were generated by mixing two ancestral populations G generations ago as described in the M&M section. We repeated each admixture scenario 120 times and analyzed with two admixture dating methods: ALDER and SPCO. Circles represent admixture dates for one simulated population and circle color indicates the method of admixture inference as shown in the legend. Red “plus symbols” show the true admixture date. doi:10.1371/journal.pgen.1005068.g006 The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 11 / 24 populations in our dataset as references for Turkic speaking populations, and report the set of “best” pairs (S4 Table) demonstrating the highest amplitude of the weighted LD curve. Interpretation of these results, however, is complicated since some of the references are admixed themselves according to our three-population tests (S2 Table). We cannot exclude the possibility that some of the references truly related to the historical ancestors yielded a lower amplitude because they were admixed. For example, the SSM populations that demonstrated the signature of recent gene flow (excess of IBD tracts) into western Turkic populations showed lower amplitude of the weighted admixture LD compared to non-admixed references that we reported in S4 Table. The SSM populations are significantly admixed (See S2 Table), and this likely happened during the Turkic migration period (S4 Table). Although we report a single admixture date for each population, we note that it is likely that the contemporary Turkic peoples were established through several migration waves [2–4, 37]. Indeed, Turkic peoples closer to the SSM area (those from the Volga-Ural region and Central Asia) showed younger dates compared to more distant populations like Anatolian Turks, Iranian Azeris, and the North Caucasus Balkars. Only Nogais, the former steppe belt nomadic people, and Kumyks inhabiting northern slopes of the Caucasus stand out from this spatial pattern. Discussion Our ADMIXTURE analysis (Fig 2) revealed that Turkic-speaking populations scattered across Eurasia tend to share most of their genetic ancestry with their current geographic non-Turkic neighbors. This is particularly obvious for Turkic peoples in Anatolia, Iran, the Caucasus, and Eastern Europe, but more difficult to determine for northeastern Siberian Turkic speakers, Yakuts and Dolgans, for which non-Turkic reference populations are absent. We also found that a higher proportion of Asian genetic components distinguishes the Turkic speakers all over West Eurasia from their immediate non-Turkic neighbors. These results support the model that expansion of the Turkic language family outside its presumed East Eurasian core area occurred primarily through language replacement, perhaps by the elite dominance scenario, that is, intrusive Turkic nomads imposed their language on indigenous peoples due to advantages in military and/or social organization. When the Turkic peoples settled across West Eurasia are compared with their non-Turkic neighbors, they demonstrate higher IBD sharing with populations from SSM and Northeast Siberia (Fig 3). There are, however, two non-Siberian populations that also demonstrate high IBD sharing with the tested Turkic peoples, the Kalmyks and Maris. These exceptions need careful consideration in light of historical data and previously published studies. For example, the Mongol-speaking Kalmyks migrated into North Caucasus from Dzhungaria (the northwestern province of China at the Mongolian border) only in the 17th century , while Maris stand out from other geographic neighbors due to unusually high recent admixture with Bashkirs: they demonstrate higher IBD sharing with Bashkirs for all IBD tract length classes (from 1–2 cM up to 11–12 cM) compared to other populations in the region (p < 0.05). This might be explained by the fact that we collected Maris samples in the Republic of Bashkortostan, where they seemingly intermarried with Bashkirs to some extent. Finally, some of the Siberian populations are in fact migrants in their current locations. For example, Yakuts, Evenkis, and Dolgans largely stem from the Lake Baikal region, which is essentially the SSM area . It turns out that most of the populations showing a high signal of IBD sharing with the western Turkic populations originated from the SSM area or had admixture with one of the tested Turkic populations. The only exception is the Nganasans; they demonstrate unusually high IBD sharing with both western Turkic peoples (Fig 3) and randomly chosen non-Turkic populations (S3 Fig). Taking into account that SSM area populations (Tuvans, Mongols, and The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 12 / 24 Buryats) can be reliably considered indigenous to their locations, and that other Siberian and non-Siberian populations (demonstrating high IBD sharing with western Turkic peoples in Fig 3) all have SSM origins, we suggest that ancestral populations from this area contributed recent gene flow into western Turkic peoples. We note that SSM matching IBD tracts were also observed with considerable frequency among some non-Turkic peoples, such as Adyghe, Maris, North Ossetians, and Udmurts (Fig 4 and S4 Fig) suggesting that gene flow from the SSM area also contributed to non-Turkic populations. In this regard, alternative explanations not related to Turkic and Mongolic migrations cannot be excluded, but these historical events remain the most likely scenario, since the high proportion of SSM matching tracts is a unifying hallmark of many western Turkic peoples and such a correlated signal of sharing with Siberian populations is not observed for any other group of populations (S3 Fig). Thus, it is likely that migrants of SSM origin interacted with many of the ancestors of contemporary West Eurasian populations, but it was the stronger interaction (reflected in higher IBD sharing) with migrant SSM ancestors that drove Turkicization. We performed a permutation test for each western Turkic population and the observed excess of IBD sharing (compared to non-Turkic neighbors) with the SSM area populations was statistically significant (Fig 4 and S4 Fig). Another important outcome of our IBD sharing analysis is the finding that two of the three SSM populations that we consider “source populations” or modern proxies for source populations are both Mongolic-speaking. This observation can be explained in several ways. For example, one may surmise that the Mongol conquests, starting in the 13th century, were accompanied by their demographic expansion over the territories already occupied, in part, by Turkic speakers, and this led to admixture between Turkic and Mongolic speakers. Alternatively, it is also probable that the ancestors of Turkic and Mongolic tribes stem from the same or nearly the same area and underwent numerous episodes of admixture before their respective expansions. The latter explanation is indirectly testified by a complex, long-lasting stratigraphy of Mongolian loan words in Turkic languages and vice versa . The first explanation is unlikely from a historical perspective since although Mongolic conquests were launched by Genghis Khan troops in the early 13th century, it is well known that they did not involve massive re-settlements of Mongols over the conquered territories. Instead, the Mongol war machine was progressively augmented by various Turkic tribes as they expanded, and in this way Turkic peoples eventually reinforced their expansion over the Eurasian steppe and beyond . Therefore, we prefer the second explanation, although we cannot entirely exclude the Mongol contribution, especially in light of admixture dates that overlap with the Mongol expansion period. Finally, our IBD sharing analysis suggested that the SSM area is the source of recent gene flow. This area is one of the hypothesized homelands for Turkic peoples and linguistically related Mongols. While the presence of the Mongol empire over this territory is well recorded, historical sources alone are insufficient to unambiguously associate this area with the Turkic homeland for several reasons: some of the Turkic groups speaking the Oghuric branch of Turkic were attested westerly in the Pontic-Caspian steppes in the mid-late 5th century CE. This is geographically distant from the SSM area, and temporarily much earlier than the Göktürk Empire was established in the SSM area. Thus, our study provides the first genetic evidence supporting one of the previously hypothesized IAHs to be near Mongolia and South Siberia. The gene flow from the SSM area that we inferred based on our IBD sharing analysis should also be detected using an alternative approach such as ALDER, which is based on the analysis of linkage disequilibrium (LD) patterns due to admixture. Using the ALDER method, we tested all possible combinations of reference populations in our dataset. LD decay patterns observed among western Turkic populations were consistent with admixture between West Eurasian and East Asian/Siberian populations (see detected reference populations in S4 Table). The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 13 / 24 Admixture dating with the set of East Asian/Siberian populations (S4 Table) inferred admixture events ranging between 816 CE for Chuvashes and 1657 CE for Nogais. We chose these reference populations based on the highest LD curve amplitudes, as suggested by the authors of the method. It is notable that all the SSM populations that were inferred to be the source of SSM gene flow were either filtered out by an ALDER pre-test procedure because of the shared admixture signal with the tested Turkic populations or had a lower amplitude of the weighted LD curve compared to the non-admixed references. Indeed, as we show, SSM populations and the two Northeastern Siberian populations all demonstrated a statistically significant admixture signal between the same set of West Eurasian and East Asian populations as western Turkic peoples do (S2 Table and S4 Table). Therefore, the set of reference populations reported in S4 Table that demonstrate the highest LD curve amplitude, in fact represent the set of non-admixed reference populations (S2 Table) that passed ALDER’s filtering procedure. This filter removes any reference population that shows shared admixture signal with the tested population. It was important for our study that the range of ALDER-inferred admixture dates overlaps with the major Turkic migrations and later Mongolic expansion (Fig 5), both of which are known to trigger nomadic migrations to Medieval Central Asia, the Middle East and Europe. In addition, when linguistic classification and regional context is taken into account, we found parallels with large-scale historical events. For example, the present-day Tatars, Bashkirs, Kazakhs, Uzbeks, and Kyrgyz span from the Volga basin to the Tien-Shan Mountains in Central Asia, yet (Fig 5) showed evidence of recent admixture ranging from the 13th to the 14th centuries. These peoples speak Turkic languages of the Kipchak-Karluk branch and their admixture ages postdate the presumed migrations of the ancestral Kipchak Turks from the Irtysh and Ob regions in the 11th century . There are exceptions, like the Balkars, Kumyks, and Nogais in Northern Caucasus, who showed either earlier dates of admixture (8th century) or much later admixture between the 15th century (Kumyks) and 17th century (Nogais). Chuvashes, the only extant Oghur speakers showed an older admixture date (9th century) than their Kipchak-speaking neighbors in the Volga region. According to historical sources, when the Onogur-Bolgar Empire (northern Black Sea steppes) fell apart in the 7th century, some of its remnants migrated northward along the right bank of the Volga river and established what later came to be known as Volga Bolgars, of which the first written knowledge appears in Muslim sources only around the end of the 9th century . Thus, the admixture signal for Chuvashes is close to the supposed arrival time of Oghur speakers in the Volga region. Differences in admixture dates for the three Oghuz speaking populations (Azeris, Turks, and Turkmens) were notable and their geographical locations suggest a possible explanation. Anatolian Turks and Azeris, whose Central Asian ancestors crossed the Iranian plateau and became largely inaccessible to subsequent gene flow with other Turkic speakers, both have evidence of earlier admixture events (12th and 9th centuries, respectively) than Turkmens. Turkmens, remaining in Central Asia, showed considerably more recent admixture dating to the 14th century, consistent with other Central Asian Turkic populations and most likely due to admixture with more recent, perhaps recurrent, waves of migrants in the region from SSM. In summary, our collection of samples, which covered the full extent of the current distribution of Turkic peoples, shows that most Turkic peoples share considerable proportion of their genome with their geographic neighbors, supporting the elite dominance model for Turkic language dispersal. We also showed that almost all the western Turkic peoples retained in their genome shared ancestry that we trace back to the SSM region. In this way, we provide genetic evidence for the Inner Asian Homeland (IAH) of the pioneer carriers of Turkic language, hypothesized earlier by others on the basis of historical data. Furthermore, because Turkic peoples have preserved SSM ancestry tracts in their genomes, we were able to perform admixture The Genetic Traces of Turkic Nomadic Expansion PLOS Genetics | DOI:10.1371/journal.pgen.1005068 April 21, 2015 14 / 24 dating and the estimated dates are in good agreement with the historical period of Turkic migrations and overlapping Mongols expansion. Finally, much remains to be learned about the demographic consequences of this complex historical event and further studies will allow the disentangling of multiple signals of admixture in the human genome and fine scale mapping of the geographic origins of individual chromosomal tracts.