Surprising new evidence which overturns current theories of how humans colonised the Pacific has been discovered by scientists at the University of Leeds, UK.
Genetic study uncovers new path to Polynesia
Science Daily, February 7, 2011
University of Leeds
Surprising new evidence which overturns current theories of how humans colonised the Pacific has been discovered by scientists at the University of Leeds, UK.
The islands of Polynesia were first inhabited around 3,000 years ago, but where these people came from has long been a hot topic of debate amongst scientists. The most commonly accepted view, based on archaeological and linguistic evidence as well as genetic studies, is that Pacific islanders were the latter part of a migration south and eastwards from Taiwan which began around 4,000 years ago.
But the Leeds research — published February 3 in The American Journal of Human Genetics — has found that the link to Taiwan does not stand up to scrutiny. In fact, the DNA of current Polynesians can be traced back to migrants from the Asian mainland who had already settled in islands close to New Guinea some 6-8,000 years ago.
The type of DNA extracted and analysed in this kind of study is that stored in the cell’s mitochondria. Mitochondrial DNA (mtDNA) is passed down the maternal line, providing a record of inheritance which goes back thousands of years. The sc ientists look for genetic signatures which enable them to classify the DNA into different lineages and then use a ‘molecular clock’ to date when these lineages moved into different parts of the world.
Lead researcher, Professor Martin Richards, explains: “Most previous studies looked at a small piece of mtDNA, but for this research we studied 157 complete mitochondrial genomes in addition to smaller samples from over 4,750 people from across Southeast Asia and Polynesia. We also reworked our dating techniques to significantly reduce the margin of error. This means we can be confident that the Polynesian population — at least on the female side — came from people who arrived in the Bismarck Archipelago of Papua New Guinea thousands of years before the supposed migration from Taiwan took place.”
Nevertheless, most linguists maintain that the Polynesian languages are part of the Austronesian language family which originates in Taiwan. And most archaeologists see evidence for a Southeast Asian influence on the appearance of the Lapita culture in the Bismarck Archipelago around 3,500 years ago. Characterised by distinctive dentate stamped ceramics and obsidian tools, Lapita is also a marker for the earliest settlers of Polynesia.
Professor Richards and co-researcher Dr Pedro Soares (now at the University of Porto), argue that the linguistic and cultural connections are due to smaller migratory movements from Taiwan that did not leave any substantial genetic impact on the pre-existing population.
“Although our results throw out the likelihood of any maternal ancestry in Taiwan for the Polynesians, they don’t preclude the possibility of a Taiwanese linguistic or cultural influence on the Bismarck Archipelago at that time,” explains Professor Richards. “In fact, some minor mitochondrial lineages back up this idea. It seems likely there was a ‘voyaging corridor’ between the islands of Southeast Asia and the Bismarck Archipelago carrying maritime traders who brought their language and artefacts and perhaps helped to create the impetus for the migration into the Pacific.
“Our study of the mtDNA evidence shows the interactions between the islands of Southeast Asia and the Pacific was far more complex than previous accounts tended to suggest and it paves the way for new theories of the spread of Austronesian languages.”
The study, which involved researchers from the UK, Taiwan and Australia, was mainly funded by the British Academy, the Bradshaw Foundation and the European Union.
The above story is based on materials provided by University of Leeds. Note: Materials may be edited for content and length.
Pedro Soares, Teresa Rito, Jean Trejaut, Maru Mormina, Catherine Hill, Emma Tinkler-Hundal, Michelle Braid, Douglas J. Clarke, Jun-Hun Loo, Noel Thomson et al. Ancient Voyaging and Polynesian Origins. American Journal of Human Genetics, Feb 3, 2011 DOI: 10.1016/j.ajhg.2011.01.009
Tracing the Austronesian footprint in Mainland Southeast Asia: A mtDNA perspective
Origins of the Moken Sea Gypsies inferred from mitochondrial hypervariable region and whole genome sequences
Journal of Human Genetics (2009) 54, 86–93; doi:10.1038/jhg.2008.12; published online 16 January 2009
Kelsey Needham Dancause1,2, Chim W Chan1,2, Narumon Hinshiranan Arunotai3 and J Koji Lum1,2,4
The origins of the Moken ‘Sea Gypsies,’ a group of traditionally boat-dwelling nomadic foragers, remain speculative despite previous examinations from linguistic, sociocultural and genetic perspectives. We explored Moken origin(s) and affinities by comparing whole mitochondrial genome and hypervariable segment I sequences from 12 Moken individuals, sampled from four islands of the Mergui Archipelago, to other mainland Asian, Island Southeast Asian (ISEA) and Oceanic populations. These analyses revealed a major (11/12) and a minor (1/12) haplotype in the population, indicating low mitochondrial diversity likely resulting from historically low population sizes, isolation and consequent genetic drift. Phylogenetic analyses revealed close relationships between the major lineage (MKN1) and ISEA, mainland Asian and aboriginal Malay populations, and of the minor lineage (MKN2) to populations from ISEA. MKN1 belongs to a recently defined subclade of the ancient yet localized M21 haplogroup. MKN2 is not closely related to any previously sampled lineages, but has been tentatively assigned to the basal M46 haplogroup that possibly originated among the original inhabitants of ISEA. Our analyses suggest that MKN1 originated within coastal mainland SEA and dispersed into ISEA and rapidly into the Mergui Archipelago within the past few thousand years as a result of climate change induced population pressure.
PLoS Biol. Aug 2005; 3(8): e247.
Published online Jul 5, 2005. doi: 10.1371/journal.pbio.0030247
Traces of Archaic Mitochondrial Lineages Persist in Austronesian-Speaking Formosan Populations
Jean A Trejaut,corresponding author1,¤ Toomas Kivisild,2 Jun Hun Loo,1 Chien Liang Lee,1 Chun Lin He,1 Chia Jung Hsu,1 Zheng Yuan Li,1 and Marie Lincorresponding author1
David Penny, Academic Editor
Author information ► Article notes ► Copyright and License information ►
This article has been corrected. See PLoS Biol. 2005 October 11; 3(10): e376.
See “Mitochondrial DNA Provides a Link between Polynesians and Indigenous Taiwanese” , e281.
This article has been cited by other articles in PMC.
Genetic affinities between aboriginal Taiwanese and populations from Oceania and Southeast Asia have previously been explored through analyses of mitochondrial DNA (mtDNA), Y chromosomal DNA, and human leukocyte antigen loci. Recent genetic studies have supported the “slow boat” and “entangled bank” models according to which the Polynesian migration can be seen as an expansion from Melanesia without any major direct genetic thread leading back to its initiation from Taiwan. We assessed mtDNA variation in 640 individuals from nine tribes of the central mountain ranges and east coast regions of Taiwan. In contrast to the Han populations, the tribes showed a low frequency of haplogroups D4 and G, and an absence of haplogroups A, C, Z, M9, and M10. Also, more than 85% of the maternal lineages were nested within haplogroups B4, B5a, F1a, F3b, E, and M7. Although indicating a common origin of the populations of insular Southeast Asia and Oceania, most mtDNA lineages in Taiwanese aboriginal populations are grouped separately from those found in China and the Taiwan general (Han) population, suggesting a prevalence in the Taiwanese aboriginal gene pool of its initial late Pleistocene settlers. Interestingly, from complete mtDNA sequencing information, most B4a lineages were associated with three coding region substitutions, defining a new subclade, B4a1a, that endorses the origin of Polynesian migration from Taiwan. Coalescence times of B4a1a were 13.2 ± 3.8 thousand years (or 9.3 ± 2.5 thousand years in Papuans and Polynesians). Considering the lack of a common specific Y chromosomal element shared by the Taiwanese aboriginals and Polynesians, the mtDNA evidence provided here is also consistent with the suggestion that the proto-Oceanic societies would have been mainly matrilocal.
Present day Taiwan is a home to heterogeneous groups of people. The main part of the Taiwanese population is composed of the Minnan (73.5%) and Hakka (17.5%) who descend from immigrants from the Fujian and Guangdong provinces of Southeast China during the last 400 years. After World War II, migration from different provinces of China brought a present-day share of 7.5% to 13% to Taiwan . Only 1.5% of today’s population of Taiwan is represented by Austronesian speakers—Atayalic, East Formosan, Puyuma, Paiwan, Rukai, Tsouic, Bunun, Western plains, Northwest Formosan, and Malayo-Polynesian languages —who are generally considered indigenous to the country. According to the year 2000 census, Yami with 4,050 individuals and Amis with 146,796 individuals represent the smallest and the largest tribal populations of Taiwan respectively.
The archaeological record suggests that a substantial cultural change occurred in Taiwan approximately in the sixth millennium BC . Whether or not the transition to Neolithic technology in Taiwan corresponded to a substantial gene flow from China is unclear. It is also not clear whether there were one or multiple waves of Neolithic migrations to northern and southern Taiwan from southeast China [4–6]. There is archaeological evidence for the occupation of caves in southern Taiwan by humans by at least 15 thousand YBP . Yet, the bulk of archaeological material started to accumulate during the Neolithic period.
Three different views have been put forward to explain the origin of present day Austronesian-speaking tribes in Taiwan, including (a) origin from expanding regions of early rice cultivation in central China , (b) origin in insular Southeast Asia or (c) local settlement after the rise of the sea levels . During the massive immigration of Han speakers to the western plains of Taiwan, most tribes took refuge in remote regions such as the central mountain ranges or the east coast of Taiwan. It is believed that this geographical isolation has largely contributed toward maintaining their culture and languages until the present day in contrast to the plain tribes that are characterized by high levels of admixture .
The role of Taiwanese indigenous populations in the Austronesian settlement of island Southeast Asia and Polynesia has come under intense discussion during the last two decades, in particular among geneticists working with mitochondrial DNA (mtDNA), Y chromosome, and HLA loci [9–21]. Although the landscape of possible migration out of Taiwan and interaction scenarios is quite complex , two opposing models, the “express train”  and the “entangled bank” models , emerging from among other alternatives [8,16,25] have commonly been tested with genetic data. Proponents of the express train model  hold that the Polynesian islands were settled in a relatively short period with a migration originating from southern China to Taiwan. According to this model, the Austronesian migration from Taiwan was coupled by the spread of Lapita culture, whose supposed precursor pottery in Taiwan is dated to at least 6000 YBP . The entangled bank model , along with the “slow boat” model [8,16,25], on the other hand, proposes that the ancestors of Polynesians did not move rapidly through Southeast Asia and Melanesia. According to these two models, the interaction with Melanesians before colonizing the Pacific was the major factor determining the genetic composition of the Polynesian ancestors. The proponents of the intermediary slow boat model [8,16,25], also popularly introduced in the book Eden in the East , propose a date and a location in Southeast Asia for the origin of the Polynesian expansion and question the Taiwanese genetic contribution to the ancestors of Polynesians.
Previous mtDNA studies [13,28] on four tribal groups (Amis, Atayal, Bunun, and Paiwan) and on the Taiwanese general population [12,29] have revealed their common ancestral origins with populations of China and/or Southeast Asia. Tajima et al.  characterized two hyper-variable segment I (HVS-I) lineage groups as unique to Taiwanese aboriginals, accounting for 22% of their mtDNA variation, and estimated their origin to approximately 11,000–26,000 years ago. Three groupings of aboriginal Taiwanese populations were suggested by this study : southern (Rukai and Paiwan), northern (Atayal and Saisiat), and central-eastern (Amis, Bunun, Tsou, Puyuma, and Yami). However, the data did not conclusively show whether these three distinctive clusters reflected three different migrations to Taiwan or whether they could be explained by the effect of drift and long-term isolation of the island’s populations.
A common HVS-I motif 16189–16217–16261, classified within haplogroup B4a , is shared between Taiwanese and Polynesian populations. This motif was considered first as the genetic link supporting Polynesian origins in Taiwan [10,13]. This root haplotype of B4a from which the Polynesian motif derives by a single HVS-I mutation is, however, widely spread in East Asia. Besides Taiwan and Southeast Asia, B4a occurs frequently in southern China and has been observed as far inland as among Mongolians [21,30–35]. Its deep time depth, 40,400 ± 9,600 years in China , significantly predates the timeframe that is relevant to the peopling of Polynesia. Therefore, this general genetic link between Taiwan and Polynesia has now been refuted as being supportive of Polynesian origins specifically in Taiwan within a recent timeframe as implicated by the fast train model .
An overwhelming majority of mainland East Asian mtDNA lineages are nested within haplogroups A, G, M, N9, and R9 [30,31,34–38]. Although previous mtDNA research on Taiwanese aboriginal populations has focused only on hyper-variable displacement loop (D-loop) data, we attempt here to define the distribution of these haplogroups and their region-specific twigs in nine Taiwan indigenous tribes (Figure 1). Using the coalescent approach, we examine the hypothesis that the distinct mtDNA variation and genetic structure seen in present-day Taiwanese mountain tribes might be the result of a Neolithic migration or of a long-term separation from their common ancestors with the modern Chinese populations of mainland East Asia. Finally, to determine if Taiwanese B4a lineages share any coding region variants with Polynesian B4a lineages , we determined the complete mtDNA sequence for eight different B4a lineages from indigenous Taiwanese.
Geographic Distribution of Nine Indigenous Tribes of Taiwan
Haplogroup Structure and Distribution in Taiwanese Aboriginal Populations
Ninety-six distinct haplotypes were identified by 81 variable sites of the mtDNA control region in 640 Taiwanese aboriginal samples representing all nine mountain tribes (Figure S1). Among these, 39 individuals possessed a unique HVS-I haplotype. From the 96 haplotypes, 30 were shared by at least two different tribes. Haplotype sharing occurred mostly between adjacent tribes (Figures 2 and S1). Phylogenetic analysis using control and coding region information clustered the observed haplotypes into 20 distinct haplogroups and subgroups whose distribution among tribes was compared to available information from other Asian populations (Table 1). Four basic haplogroups—B, E, R9, and M7–accounted for more than 90% of the variation observed in aboriginal Taiwanese. In comparison, the combined frequency of these haplogroups in China averaged less than 40%, with southern Chinese having significantly higher frequency than northern Chinese (Table 1). Among these four basic haplogroups, haplogroup E was nearly absent in continental Asia. On the other hand, haplogroups A, D4, G, and M8-M10 accounted approximately for 41% of the sequences from the mainland, whereas it was rare or absent in Taiwanese Aborigines. Only two unique D4 haplotypes were observed in the Atayal and Saisiat populations and a single G1a lineage in Tsou (Table 1; Figures 2 and S1).
Tree Drawn from a Median-Joining Network of 96 mtDNA Haplotypes Observed in Nine Indigenous Taiwanese Populations
Haplogroup Frequencies in Taiwan, East Asia, and Oceania
Principal Components Analysis
Principal components (PC) analysis using haplogroup frequencies revealed a high level of differentiation between Taiwanese aborigines as compared with other Asian populations (Figure 3). In particular, Taiwanese aboriginal populations appeared closer to island Southeast Asian populations (Luzon, Philippines, Moluccas, and Indonesia) than to populations from mainland East Asia (Fujian, South Vietnam, Malaysia, and Thailand). The three southernmost populations of Taiwan (Puyuma, Paiwan, and Rukai) and, more distantly, Yami from Orchid Island clearly differentiated the southern populations from the northern and central populations of Taiwan from which the Bunun sample emerged as an outlier. Although the Amis population clustered closely with northern and central tribes in the first two dimensions of the PC analysis, their haplogroup structure and relatively high frequency of B4a, D5, and M7c clades at the same time showed an affinity toward southern tribes of Taiwan. The grouping of Taiwanese indigenous populations revealed by our analysis differs from the population tree drawn from pairwise nucleotide differences by Tajima et al.  in which Puyuma and Yami, for example, clustered with central and eastern populations.
Principal Components Analysis
Tribes of Northern and Central Taiwan
Haplogroups B4b, B5a2, E, F4b, and M7b covered more than 80% of the mtDNA variation observed in Atayal, Saisiat, and Bunun populations (north and central Taiwan; see Figure 2). The frequency of these haplogroups elsewhere in Taiwan was significantly lower (24%; p < 0.0001). Haplogroup E was most divergent and frequent in the Saisiat population in the northern mountain ranges of Taiwan.
Two subclades of haplogroup B5a can be defined on the basis of complete sequence information [37,38], available restriction fragment length polymorphism (RFLP) and HVS-I information from Southeast Asia [11,30,33,34,40,41], and our data: The loss of a HaeIII site at nucleotide position (np) 6957 in association with 16266A allele defines the major subclade B5a1 spread in Southwest China, Thailand, Vietnam, and Nicobar Islands whereas the presence of +11146 DdeI site at np 11146 defines a B5a2 clade that is predominantly in association with 16266G allele. Within the B5a2 clade, available D-loop information allows us to postulate the presence of two further branches that are highly region specific in Southeast and East Asia: HVS-I motif 16140–16189–16266G-16362 (B5a2a), which so far has been observed exclusively in Taiwan [9,28], whereas another motif, 16140–16187–16189–16266A/G, characterizes all B5a variants observed so far in Korean, Japanese, and Han lineages from northeast China [33,34,40,42]. In Taiwanese aboriginals, B5a2 lineages were found all over the island whereas their frequency, likely due to drift, was the highest in north-central regions, among the Tsou and Saisiat (see Figure 2).
Over one quarter of Taiwanese from northern and central mountain regions (Atayal, Saisiat, and Bunun) belonged to a novel subclade of haplogroup F4 [38,39,43,44] called F4b because of its distinctive mutation motif 10097C-16218–16311 (see Figure 2). Considering available mtDNA datasets for Southeast Asia, haplogroup F4b has a marginally low frequency (60%; Table 1) among other island Southeast Asian populations who similarly present significant differences in their combined haplogroup frequencies with mainland East Asian populations. It is highly unlikely that the increment of the compound frequency of haplogroups B, E, R9, and M7 seen among all aboriginal populations of Taiwan could be explained by drift. It seems more plausible that the ancestral population of coastal East Asia and island Southeast Asia was already enriched by the founder lineages of these haplogroups and that drift affected the frequencies of individual haplogroups while their combined frequency throughout this region remained high.
In light of this study, the complete absence among aboriginal populations of several mtDNA haplogroups common on the Chinese mainland would suggest that the Neolithic colonizers (a) did not contribute significantly to the mtDNA pool of the pre-existing Formosan population; or (b) that the Neolithic migrants were a small group already lacking haplogroups A, C, G, D4, and M8–M10 that might have become frequent in southern China more recently.
Although possessing the same general haplogroup composition, the nine Taiwanese aboriginal populations are all significantly different from each other (in genetic distances), with southern populations showing a frequency pattern different from that of the central and northern groups (see Figure 3) and each population revealing its own specific founder haplotypes (see Figure 2). Given the low geographic distances between the sampling locations, this striking diversity between the tribes, seen also in HLA studies , may denote prevalence of strong inter-group cultural differences preserved through social isolation and endogamy of the tribes. Even though intermarriages may have been uncommon between the tribes, it is possible that small-scale admixture could have occurred through child adoption among the mountain tribes .
Fitting the mtDNA Heritage of the Taiwanese Aboriginals into the Models of Austronesian Expansion
Genetic models explaining the origins of Austronesian migration can be categorized by the three following components: the place of origin, the time scale, and the correspondence with specific archaeological evidence. Studies based on Y chromosomal and mitochondrial markers have traced in Polynesians the presence of lineages that are characteristic of Melanesians and East Indonesians but which are absent in mainland East Asia and Taiwan [8,16,25,56]. Evidence for interaction or initiation of a human settlement process, directed toward Remote Oceania, somewhere in East Indonesia or Melanesia is also supported by the phylogenetic analyses of Polynesian rats, Rattus exulans . Besides the Melanesian-specific component, both human mtDNA and Y chromosomal haplogroups found in Polynesians include those that are common in populations of mainland East Asia and also in Taiwan . This general share of ancestry refers, first of all, to mtDNA haplogroup B4a, which is the most frequent lineage group among Polynesians. However, the specific HVS-I motif associated with Polynesian expansion occurs only in Near and Remote Oceania whereas its immediate ancestral sequence is common throughout East Asia and has a coalescence time significantly predating the Polynesian migration .
Phylogenetic analysis of complete mtDNA sequences (Figure 4) in this study reveals the presence of a motif of three coding region mutations (nps 6719, 12239, and 15746) that define haplogroup B4a1a and are shared among aboriginal Taiwanese, Melanesians, and Polynesians. No mainland East Asian population has yet been found to carry lineages derived from these three positions. This suggests that the motif may have evolved in populations living in or near Taiwan at the end of the Late Pleistocene period. Considering the differences between the Late Pleistocene and present-day shorelines of Southeast Asia, the B4a1 lineages may also have evolved in regions now submerged under the sea. As long as mtDNA lineages of the Philippines and Indonesia, in particular, have not been analyzed in similar detail, the question about the precise origin of B4a1a has to be left open. The presence of an additional coding region mutation (np 14022) that defines haplogroup B4a1a1 and the HVS-I transition at np 16247 in Melanesians and Polynesians points to a maturation phase of the Austronesian migration in East Indonesia or Melanesia, during which the final Polynesian motif of mutations was established.
Austronesian is the world’s most widely distributed language family, yet nine of its ten subfamilies are restricted to Taiwanese aboriginal populations . The phylogeny of B4a1a mitochondrial genomes resembles the linguistic reconstruction of Austronesian languages with five of its primary branches restricted to Taiwan and the sixth branch spread all over Oceania. The 9.3 ± 2.6 thousand-year-old coalescence date we obtained using only coding region information for B4a1a1 diversification in Papuans and Polynesians predates significantly the earliest signs of Lapita culture in the region (around 3,500 BP). Nonetheless, these findings provide the first direct phylogenetic evidence for the common ancestry of Austronesian and indigenous Taiwanese maternal lineages and their maturation phase in East Indonesia or Melanesia. High frequency of a tagging mitochondrial haplotype in contrast to the absence of such in the Y chromosomes of Polynesians and Taiwanese might lend credence to the suggestion that the proto-Austronesian communities were matriarchal  and matrilocal (as the Amis tribe still is in Taiwan) whereby the Y chromosome pool of the initial migrants was lost after being repeatedly diluted on the way toward Polynesia.
Taiwanese aboriginal populations share their maternal ancestry with populations of mainland East Asia through haplogroups B, R9, and M7 as their main genetic components. At the same time the haplogroup structure at a finer phylogenetic resolution suggests relatively long-term isolation from the mainland populations. The coalescence times of B4a1a, F3b, F4b, R9c, and M7c1c lineages point to founder effects in Taiwan ranging from recent (0–2,000 years) to more ancient times (7,000–20,000 years). These results most likely reflect the drift in small endogamous populations of the island that became isolated by the rising sea levels after the last Ice Age.
The time element (13.2 ± 3.8 thousand years to the MRCA) obtained from the phylogenetic reconstruction of complete B4a1a sequences requires that we adopt a model according to which the origin of Austronesian migration can be traced back to Taiwan, and allows for the notion that it was followed by interaction periods elsewhere in Indonesia and finally in Melanesia where the complete motif specific to Polynesian B4a1a1 sequences (Polynesian motif) was developed.
PLoS One. 2012; 7(5): e36437.
Published online May 7, 2012. doi: 10.1371/journal.pone.0036437
Patrilineal Perspective on the Austronesian Diffusion in Mainland Southeast Asia
Jun-Dong He,#1,2,3 Min-Sheng Peng,#1,3,8 Huy Ho Quang,5 Khoa Pham Dang,5 An Vu Trieu,5 Shi-Fang Wu,1 Jie-Qiong Jin,1 Robert W. Murphy,1,7 Yong-Gang Yao,6 and Ya-Ping Zhang1,3,4,*
Manfred Kayser, Editor
Author information ► Article notes ► Copyright and License information ►
This article has been cited by other articles in PMC.
The Cham people are the major Austronesian speakers of Mainland Southeast Asia (MSEA) and the reconstruction of the Cham population history can provide insights into their diffusion. In this study, we analyzed non-recombining region of the Y chromosome markers of 177 unrelated males from four populations in MSEA, including 59 Cham, 76 Kinh, 25 Lao, and 17 Thai individuals. Incorporating published data from mitochondrial DNA (mtDNA), our results indicated that, in general, the Chams are an indigenous Southeast Asian population. The origin of the Cham people involves the genetic admixture of the Austronesian immigrants from Island Southeast Asia (ISEA) with the local populations in MSEA. Discordance between the overall patterns of Y chromosome and mtDNA in the Chams is evidenced by the presence of some Y chromosome lineages that prevail in South Asians. Our results suggest that male-mediated dispersals via the spread of religions and business trade might play an important role in shaping the patrilineal gene pool of the Cham people.
The Austronesian language family is one of the largest and most widespread language families. It is spoken by more than 350 million people on islands from Madagascar to Easter Island , . Nevertheless, the languages in this family have a rather limited distribution on the mainland. Chamic, the representative language of the family, is spoken by the Cham people. In Mainland Southeast Asia (MSEA), Chamic exists as a “linguistic enclave”, because it is surrounded by non-Austronesian-speaking groups (e.g. Mon-Khmers) , , . Many studies investigate the diffusion of Austronesian in MSEA by tracing the origin of the Cham people. The “Out-of-Taiwan” hypothesis regards the Cham ancestors as the Austronesian immigrants from Island Southeast Asia (ISEA) and immigration is dated to around 500 BC , , . Before the arrival of the Austronesian immigrants, southern Vietnam appears to have been occupied by the local Austro-Asiatic speakers, especially Mon-Khmers . There is a high chance of admixture between the Chams and Mon-Khmer groups. Previously linguistic analyses of the Chamic report that some loan-words from Mon-Khmer languages form indigenous cultural contributions , . The “Nusantao Maritime Trading and Communication Networks” hypothesis states that cultural diffusion through trading and communication networks played an important or even dominant role in the ethnogenesis of the Cham . Because the origin of the Cham people is open to debate, the demographic history of the Austronesians in Southeast Asia requires further investigation.
Analyses of mitochondrial DNA (mtDNA) variation of the Cham population resolve a closer relationship with populations in MSEA rather than with those from ISEA, and this occurs despite that recent gene flow from ISEA . This result suggests that the origin of the Cham people likely involves the massive assimilation of local Mon-Khmer populations, and this is accompanied with language shift. Thus, the Austronesian diffusion in MSEA appears to mediated mainly by cultural diffusion . Because mtDNA data only offer a maternal perspective, only half of the story is known. Does patrilineal history reveal the same story? We address this question by evaluating non-recombining region of the Y chromosome (NRY) markers, including 26 single-nucleotide polymorphisms (Y-SNPs) and eight short tandem repeats (Y-STRs), in 59 male Cham individuals whose matrilineal histories are known . For comparison, the NRY markers of 76 Kinh, 25 Lao, and 17 Thai males were also surveyed (Figure 1; Table 1).
Populations from southern China and Southeast Asia analyzed in this study.
General information for 57 populations in southern China and Southeast Asia.
Phylogeny of Y chromosomes
Based on 26 Y-SNPs, all 177 newly genotyped males from the four populations were assigned to specific (sub-)haplogroups (paragroups) defined in the phylogeny (Figure 2; Table S1). Nearly 60% of the Chams’ Y chromosomes belonged to P191-derived haplogroups. Within this group, O-M95* predominated and accounted for around 30% of all samples. Haplogroup C-M216, consisting of C-M217 and C-M216*, comprised 10.2% of the patrilineal lineages. One Cham individual (∼1.7%) rooted near the base of the tree as haplogroup F-M213* and six individuals (∼10.2%) rooted at the base of the tree as haplogroup K-P131*. Notably, South Asian-prevailing haplogroups R-M17 (∼13.6%), R-M124 (∼3.4%), and H-M69 (∼1.7%) are identified with the Chams.
Classification tree of 26 NRY haplogroups along with their frequencies (%) in four populations.
Genetic relationships between the Cham and other Southeast Asian populations were discerned with the aid of additional published Y-chromosomal datasets (Figure 1; Table 1). We employed a principal component analysis (PCA) based on the NRY haplogroup distribution frequencies of 45 populations (Table S2) to show the overall clustering pattern of the populations. Populations from eastern ISEA (EISEA) and from Laos formed two clusters in the first PC (Figure 3) and this pattern was mainly owed to haplogroups C-M216, K-P131*, and O-M95* (Figure S1). The second PC resolved a close affinity between the Kinh and Vietnamese (most likely, the Kinh) populations with those from mainland southern China due to the high frequency of haplogroup O-M88 (Figure S1). The Cham population showed a close affinity to some but not all populations from western ISEA (WISEA; Figure 3). The clustering pattern revealed by PC1 and PC2 was statistically significant (P<0.05) in AMOVA based on the same profiles of haplogroup distribution frequencies (Table S2). Nevertheless, in terms of the linguistic affinities, the difference between Austronesian (i.e. Cham and WISEA populations) and non-Austronesian (i.e. other MSEA populations) was not statistically significant according to AMOVA (p = 0.08). We incorporated data for eight common Y-STRs (DYS19, DYS389-I, DYS389-II, DYS390, DYS391, DYS392, DYS393, and DYS439) from additional populations in MSEA , , Multidimensional scaling (MDS) based on RST genetic distances for these Y-STRs did not associate the Chams with populations from WISEA (Figure 4).
PCA plot based on NRY haplogroup frequencies of 45 populations in southern China and Southeast Asia.
MDS plot of 53 populations with RST genetic distances based on eight common Y-STRs.
Admixture in the Cham population
The origin of the Chams could not be simply explained as a demic diffusion of Austronesian immigrants from WISEA. The genetic patterns between the Cham and other Southeast Asian populations, as detected in PCA and MDS, suggested a more complex history. The complex demographic process likely involved genetic admixture with local non-Austronesian speakers in MSEA. Therefore, we performed the admixture analysis ,  to quantify the proportion of genetic contribution from WISEA and MSEA to the Chams (Table 2). The patrilineal contribution from WISEA to the Chams (0.37595) was less than that from MSEA (0.62405). Comparatively, the Vietnamese (most likely, the Kinh) population from southern Vietnam had a dominant proportion of the MSEA contribution (0.842972; Table 2), although the large standard deviation values made the results should be treated with caution.
Admixture analysis of the two populations from southern Vietnam.
Haplotype diversity analyses
To discern the relationship between the Y-STR haplotypes in the Chams and other Southeast Asians, median-joining networks  were constructed using eight common Y-STRs for each of the 11 haplogroups found in the Cham population (Figure 5). In the networks of haplogroups O-M95* and P-P27.1, some haplotypes were exclusively shared between the Cham and ISEA populations. In the networks of C-M216, F-M213*, and K-P131*, some haplotypes in the Chams were derived directly from those in ISEA populations. These lineages in the Chams were most likely introduced by recent gene flow from ISEA. In contrast, the networks for haplogroups O-M7, O-M88, O-M134, O-P191*, and O-P200* indicated closer associations between the Chams and MSEA populations. Most Cham lineages either had identical counterparts or were linked to those haplotypes in MSEA populations; the numbers of mutations between the Chams and MSEA were less than those between the Chams and ISEA (Table S3). These patterns would suggest that these Chams lineages had an in situ origin from MSEA. Among the 48 haplotypes identified in the Chams, 11 and 18 were shared with those in ISEA and MSEA, respectively (Table S3). Nevertheless, the counts for shared haplotype did not differ significantly (two-tailed Fisher’s exact test, P = 0.303; Table S4). Moreover, six haplotypes belonging to haplogroup O-M95* were shared by both ISEA and MSEA groups. The exact origin of these lineages in the Chams remains elusive.
Median-joining networks of eight Y-STRs within NRY haplogroups C-M216, F-M213*, K-P131*, O-M7, O-M88, O-M95*, O-M119*, O-M134, O-P191*, O-P200*, and P-P27.1.
To trace the source of the exotic South Asian prevailing components, we incorporated published data (Table S5) from India , , Pakistan , and West Asia ,  and reconstructed median-joining networks of haplogroups R-M17 and R-M124 (Figure 6). All haplotypes in the Chams were scattered in the networks, which implied that these lineages had an origin via recent gene flow rather than deeply rooted ancestry. Two Cham lineages of R-M124 were shared the same haplotype with those from North India. This observation suggested that North India might be the original source of the R-M124 lineages in the Chams. The relationships among lineages of R-M17 were complex in the network, which suggested multiple geographic/ethnic sources for the R-M17 lineages in the Chams.
Median-joining networks of eight Y-STRs within NRY haplogroups R-M17 and R-M124.
Integrating the information from two uniparentally inherited markers (NRY and mtDNA) is a powerful means of disentangling the human population histories , and especially for elucidating sex-biased migrations and social-cultural effects . Compared with our previous study for mtDNA variation in the Chams , the current assessment for NRY variation facilitates a better understanding into the origin of the Cham people. Both NRY and mtDNA haplogroup profiles (Figure 7) suggest that, in general, the Chams are indigenous to Southeast Asia. Characteristic East and Southeast Asian lineages, viz., NRY haplogroups O-P191 and C-M217, together with mtDNA haplogroups B, F, M7, and R9, accounted for the majority of the patrilineal (∼67.8%) and matrilineal (∼60.1%) gene pools of the Chams, respectively. Some ancient Southeast Asian components (NRY haplogroups: C-M216*, F-M213*, and K-P131*; mtDNA haplogroups: M*, N*, and R*) were also identified in the Chams.
NRY and mtDNA haplogroup profiles for the Chams and the Kinhs.
The origin of the Chams appears to be much more complex, at least based on the results of PCA, MDS, AMOVA, and haplotype (near-) matching analyses. Recent gene flow from ISEA is detected in the patrilineal pool of the Chams, most likely via the dispersal of Austronesian speakers. Further, the Cham population also contains a significant amount of local genetic contributions from non-Austronesian populations in MSEA. This pattern corresponds with our previous study based on mtDNA . Taken together, the origin of the Chams is mainly a result of admixture between the Austronesian immigrants from ISEA with the indigenous populations (most likely, Mon-Khmers) in MSEA.
South Asian NRY haplogroups R-M17, R-M124, and H-M69 ,  are common in the Chams (∼18.6%; Figures 2) yet no mtDNA haplotypes are known . Male South Asians contribute to the genetic makeup of Chams, but not South Asian females. The existence of these South Asian patrilineal lineages was in good accordance with the archaeological and historical records. The dominant religion of the Cham people is known to have been Hinduism (overwhelmingly Shaivism) and their culture was deeply influenced by that of India , . Both Indian and Cham people appear to have played important roles in Southeast Asian maritime trade , . Contact between the two peoples makes gene flow between them inevitable. The discordance between NRY and mtDNA contributions in the Chams (Figure 7) is well explained by the male-mediated dispersals, most likely through the spread of religions and business trade. In particular, the admixture between alien males and local females is compatible with the matrilocal residence in the Cham people , .
Patrilineal genetic structuring differs between the Chams and Kinhs. For instance, in contrast to the Chams, frequently the Kinhs have lineages (8/76, ∼10.5%) from the characteristic Chinese haplogroup O-M7  yet only one lineage from the South Asian haplogroup R-M17 (Figure 7). In addition to the Sinicized cultures, substantial Chinese assimilation into the Kinh people via immigration is suggested for northern Vietnam , . Thus, the different ethnohistories of the Chams and Kinhs are reflected by their unique mtDNA and NRY patterns.
In summary, this study expands our knowledge on the complex history of the Austronesian diffusion in MSEA. Further improvements to the resolution of the NRY tree ,  will help to unravel the story of the Cham people. This initiative will also benefit from the employment of genome-wide autosomal markers , , . In the future, a comprehensive study involving extensive sampling will pinpoint more details about the demographic history, such as the source and route for migration, the timing for admixture and expansion.
Materials and Methods
Samples and data collection
Blood samples of 177 unrelated males were collected from four populations (Table 1; Figure 1). Among them, samples from 59 Cham individuals were collected from Binh Thuan province, southern Vietnam. Binh Thuan was part of the Cham principality of Panduranga, the last Cham territory that had been annexed by Nguyen Vietnam in 1832 AD , , and it was said to harbor a significant number of Chamic speakers . The mtDNA data of the Cham, Kinh, and Thai populations were previously reported , . This study was approved by the Institutional Review Board of Kunming Institute of Zoology. All subjects were interviewed to obtain informed written consent before sample collection.
Comparative NRY data from southern China and Southeast Asia (Figure 1; Table 1) were taken from previously published literature , , , . We uniformed all Y-SNPs and Y-STRs data into the same resolution to include as more populations as possible. This truncation of some data caused the NRY haplogroups collapsed into 16 clusters (Table S2), and Y-STRs were reduced to eight loci (DYS19, DYS389-I, DYS389-II, DYS390, DYS391, DYS392, DYS393, and DYS439). Additional data of haplogroups R-M17 and R-124 were collected from published South and West Asian datasets , , ,  (Table S5).
DNA extraction and genotyping
Genomic DNA was extracted by the standard phenol/chloroform methods. Seventeen Y-SNPs (Table S1) were genotyped by the GenomeLabTM SNPstream® (Beckman Coulter). We used three panels of multiplex PCR reactions following manufacturer’s recommendation (Protocol S1). The primers for multiplex PCR and single base extension reactions were designed by Autoprimer software (Beckman Coulter) . To improve the resolution of phylogeny, we further screened nine Y-SNPs by direct sequencing some individuals (Table S1). The PCR amplification and sequencing primers were previously reported . Using described methods , , , we genotyped eight Y-STRs (DYS19, DYS389-I, DYS389-II, DYS390, DYS391, DYS392, DYS393, and DYS439) on an ABI 3730 DNA Analyzer (Applied Biosystems). For DYS389-I and DYS389-II, we used the genotyped data of DYS389-I, and DYS389-II minus DYS389-I in our analyses.
Arlequin 3.5 (http://cmpg.unibe.ch/software/arlequin35/) was used to calculate AMOVA and RST distances . Principal component analysis (PCA) and multidimensional scaling (MDS) were performed using SPSS 13.0 software (SPSS). In PCA, the original haplogroup frequency data were transformed to standardize against the different effect of genetic drift on haplogroups of different frequencies . Admix 2.0 (http://web.unife.it/progetti/genetica/Isabelle/admix2_0.html) was used to estimate the level of admixture of MSEA and WISEA groups in the Cham and Vietnamese populations , . The average haplogroup frequencies of MSEA and WISEA were taken for the two parental populations, respectively. Median-joining networks  of Y-STRs within certain haplogroups were constructed with NETWORK 4.6 (http://www.fluxus-engineering.com/network_terms.htm).
Chin J Cancer. Feb 2011; 30(2): 96–105.
Ancient migration routes of Austronesian-speaking populations in oceanic Southeast Asia and Melanesia might mimic the spread of nasopharyngeal carcinoma
Jean Trejaut, Chien-Liang Lee, Ju-Chen Yen, Jun-Hun Loo, and Marie Lin
Author information ► Article notes ► Copyright and License information ►
This article has been cited by other articles in PMC.
Mitochondrial DNA (mtDNA) and non-recombining Y chromosome (NRY) are inherited uni-parentally from mother to daughter or from father to son respectively. Their polymorphism has initially been studied throughout populations of the world to demonstrate the “Out of Africa” hypothesis. Here, to correlate the distribution of nasopharyngeal carcinoma (NPC) in different populations of insular Asia, we analyze the mtDNA information (lineages) obtained from genotyping of the hyper variable region (HVS I & II) among 1400 individuals from island Southeast Asia (ISEA), Taiwan and Fujian and supplemented with the analysis of relevant coding region polymorphisms. Lineages that best represented a clade (a branch of the genetic tree) in the phylogeny were further analyzed using complete genomic mtDNA sequencing. Finally, these complete mtDNA sequences were used to construct a most parsimonious tree which now constitutes the most up-to-date mtDNA dataset available on ISEA and Taiwan. This analysis has exposed new insights of the evolutionary history of insular Asia and has strong implications in assessing possible correlations with linguistic, archaeology, demography and the NPC distribution in populations within these regions. To obtain a more objective and balanced genetic point of view, slowly evolving biallelic Y single nucleotide polymorphism (Y-SNP) was also analyzed. As in the first step above, the technique was first applied to determine affinities (macro analysis) between populations of insular Asia. Secondly, sixteen Y short tandem repeats (Y-STR) were used as they allow deeper insight (micro analysis) into the relationship between individuals of a same region. Together, mtDNA and NRY allowed a better definition of the relational, demographic, cultural and genetic components that constitute the make up of the present day peoples of ISEA. Outstanding findings were obtained on the routes of migration that occurred along with the spread of NPC during the settlement of insular Asia. The results of this analysis will be discussed using a conceptual approach.
Keywords: Mitochondrial DNA, Polynesian motif, Out of Taiwan, Austronesian speakers
Nasopharyngeal carcinoma (NPC), often referenced as the “Cantonese Cancer”, could also be referenced as the “Bai Yue Cancer” as NPC is most prominent among these people. Descendant of the Bai Yue have become a great migrating people and survey of their distribution across the world today mimic surprisingly the distribution of NPC in different populations. NPC has been observed among Taiwanese Han and Taiwan aborigines (TwA), among island Southeast Asia (ISEA) islanders and Polynesians. Except for the Han who moved to Taiwan 400 years before present (YBP) and finally contributed to 98% of the Taiwan population, TwA and most other populations of ISEA are speakers of Austronesian languages and are believed to share a common ancestry with the Bai Yue of south China. It has been genetically demonstrated – that islanders from ISEA and TwA had separated from mainland Southeast Asia (MSEA) more than 15 000 YBP.
All extant Asian or Melanesian individual mtDNA types are descendent of founding macro haplogroup either type M or type N,. These two mtDNA haplogroups share a common ancestry with African super haplogroup L3 which was carried by the only small group of people who successfully passed through the horn of Africa ∼80 000 YBP and later migrated “out of Africa” ∼60 000 YBP as a group bearing only haplogroups M and N (the two daughters of haplogroup L3). In less than 5 000 years (a time that was too short to allow for the appearance or the fixation of new mutations of mtDNA macro haplogroups N or M), these peoples established settlements in India, Sundaland (MSEA), Papua New Guinea and Australia. Much later, in the last 15 000 years when circumstances, dictated by fluctuations in sea levels and climatic conditions, were more favourable, they settled in America. Interestingly, European ancestors from west Eurasia (a small group of people all belonging to mtDNA haplogroup N) moved to Europe much later than the eastern wave (∼35 000 YBP) and intrinsically the genetic diversity observed in Europeans is less than the genetic diversity seen in Asians which itself is less than the diversity of Africans.
In this report we present phylogenies of a few pertinent mtDNA haplogroups which bring new insights toward a better understanding of various population migration and settlement events that occurred among non-NPC-affected populations from Melanesia [Papua New Guinea (PNG) and Australian aborigines], and NPC-affected populations from MSEA and ISEA (and Polynesia).
In 2005 and 2007, Friedlander et al. and Hudjashov et al. produced trees of complete mtDNA sequences of founding macro haplogroups M and N showing that aboriginal Australians were most closely related to the autochthonous populations of New Guinea/Melanesia, indicating that prehistoric Australia and New Guinea were occupied initially as a unique Palaeolithic colonization event ∼50 000 YBP. The question remains as to whether PNG and Australia were reached separately, sequentially, or even several times after the initial settlement event. For this they separately analyzed the distribution of all subtypes of Melanesian mtDNA haplogroups M and N. Only one mtDNA subtype of macro haplogroup N (haplogroup P) will be described here ,–.
While some variants of P (P1 and P2 in Figure 1) ,– were very common in Papua New Guinea , variants P5, P6, P7 and P9 were unique to Australia. Only more recent subtypes of variants P3 and P4 were seen in PNG and Australia.
Most Parsimonious tree of haplogroup P (a subtype of N and R). Before 2009, all known branches of P were seen either in Australia or in Melanesia. Here new branches, namely P8 and P10 (circled in left column), were found in the Philippines,– …
Molecular dating of haplogroup P in Melanesia and Australia suggested that a first stage expansion had occurred before people reached Sahul (∼50 000 YBP), when ancestral haplogroup P first appeared with mutation at nucleotide position (np) 15607 (Figure 1) from its mtDNA ancestral macro haplogroup N coming directly from the Middle East. It is therefore during this early period, when haplogroup P was still undifferentiated, that anatomically modern Human moved to PNG and Australia where haplogroup subtypes P1 and P2 in PNG and P5, P6, P7 and P9 in Australia would later appear. This view was supported in 2007 by Hudjashov et al. who indicated that groups of modern humans who immigrated to Australia or PNG had been isolated since their initial settlement, and that haplogroup P had probably made it first appearance in the close vicinity of PNG longitudes,. Hudjashov et al., observing the sharing of P3 and P4 between the two regions, hypothesized gene flow of subtypes of P3 and P4 between PNG and Australia. Alternatively, subtypes of haplogroups P3 and P4 dating 30 000 YBP could have independently moved in late Pleistocene from ISEA where they initially expanded and disappeared by drift as populations were small. The most parsimonious tree in Figure 1 shows that only distinct subtypes of P3 or P4 are seen in either Australia (P3a and P4b1) or PNG (P3b and P4a/b), indicating independent dispersals from ISEA but no later sharing due to migrations from or to PNG or Australia.
This last alternative suggesting an origin of P in ISEA was supported in 2009 when Trejaut et al. sequenced two new haplogroup P (P8 and P10) from Philippine individuals. As with all other major branches of haplogroup P, P8 derived from founder macro haplogroup N by a single coding region mutation at np 15607 (left circle in Figure 1) and expanded locally. Since no other haplogroup P were found in ISEA (except for P10, see below), one could suppose: (1) That people from PNG (rather than from Australia) migrated back and reached the Philippines, but so far, no trace of P8 has yet been found outside of the Philippines! (2) That P8 is the result of a recurrent mutation at np 15607. This alternative is unlikely, as np 15607 is not known as a hot spot; and (3) That when np 15607 first appeared in western ISEA, macro haplogroup P first expanded there and then dispersed randomly reaching the Philippines, PNG and Australia separately. Interestingly, all traces of P in populations situated between PNG and the Philippines would have either disappeared by drift or not yet been sampled.
As mentioned above, the other P matrilineage (P10) is only seen in Philippines,,. In addition to np 15607, its subtypes share a transition at np 3882 with haplogroup P2, a haplogroup found only in New Guinea and Near Oceania. It is unlikely that P10 is the result of a recent back migration from New Guinea to the Philippines, as P10, like P8, is completely absent in other regions of ISEA. Rather, the hypothesis proposed above in “c” is most likely.
In search for supplementary supporting arguments to this hypothesis, our laboratory collected 400 specimens from Borneo, Sumatra, Sulawesi, Java and a few from east Indonesia. As foreseen by Hill et al., we found that as much as 14% of the mtDNA diversity seen in ISEA populations was made up of new non-interconnecting deep-rooted lineages (basal haplogroups descendent of macro haplogroups M), indicating long-term in situ evolution (isolation). These archaic matrilineages, similarly to haplogroup P, were connected directly in a star-like fashion to founding macro haplogroups M (as haplogroup P is connected to N). Similar observations have been described in the SEA mtDNA structures of the Andamanese (M31 and M32), in Malaysia (M21 and 22), in Papua New Guinea (M27, M28 and M29), in India , and more recently in the Philippines (M71 to M73),. Further, complete sequencing of all deep-rooted lineages described in the studies of Trejaut et al., characterized more than 23 and 6 new basal haplogroups belonging to branches not yet defined of macro haplogroups M and N respectively. Molecular dating estimates of these basal groups, obtained from coding region variations (46 000 YBP to 50 000 YBP), suggested that these lineages represented vestiges of a Pleistocene genetic pool of the first anatomically modern humans who settled the ancient continent of Sundaland most likely much before the appearance of NPC.
In summary, the unexpected high number of new basal lineages in ISEA, rooting directly to superhaplogroups M or N, and the presence of similarly unique and unshared basal lineages all along the southern hemisphere coastlines, from the horn of Africa through Melanesia and then to New Guinea, Near Oceania or Australia, may have implications for the effective population size of the first settlers, and imply a rapid eastward migration (∼700 meters per year). Like haplogroup P, novel haplogroups found in ISEA, did not share any structural characteristics with any other M and N subgroups previously described for East Asia, West Asia, India or Eurasia. Most remarkably, they indicated that west ISEA had been a very active center of expansion and of dispersal in the late Paleolithic period. Their low frequency (14%) in Indonesia strongly suggests that the initial ISEA gene pool has been replaced as the result of an early Holocene wave of migration from MSEA by demic diffusion (total replacement of the first Paleolithic settlers of ISEA). This universally accepted view should be reassessed, as we have shown here that population replacement by migrants from MSEA was incomplete (14% of non Asian matrilineages). The remaining 86% sequences in ISEA and Taiwan found their ancestry in SEA, they are shared by most groups of Austronesian speakers and are characterized by haplotypes that belonged to already well defined and much younger twigs of haplogroup M, such as G1, D4, M9, M7, M13 and Z, or of haplogroup N such as B4, B5, F1 and N9a. Most importantly they represent non-Melanesian populations, most likely bearers of NPC, and correspond to much more recent migrations from MSEA.
“Express train”, “Slow boat” and “Out of Taiwan” Models Are All Components of a Single Phenomenon
Previous mtDNA variation studies in populations of the Pacific and western Indonesia have shown that a particular mtDNA mutation consisting of a deletion of nine base pairs (9bp-del), between the cytochrome oxidase II and lysyl-tRNA genes, has reached gene fixation in most Austronesian-speaking populations of the Pacific islands– and Madagascar. It was suggested that the 9bp-del was spread by bearers of mtDNA haplogroup B in MSEA where incidence of NPC is high. It was later determined that an mtDNA substitution at np 16217 arose on the background of the 9-bp deletion, and was followed by a substitution at np 16261 which is seen throughout mainland and insular Asia among all bearers of haplogroup B4a1 (insert in Figure 2) . In the pre-Holocene period, three other substitutions (at nps 6719, 12239 and 15746) appeared on a branch of B4a1 and now determine haplogroup B4a1a. B4a1a dispersed so quickly throughout western ISEA and Taiwan where the type is the most prominent that it is difficult to determine the location of its origin. At the beginning of the Neolithic period another mutation on one of the daughters of haplogroup B4a1a appeared (at np 14022) which now determines haplogroup B4a1a1 (also described as the proto-Polynesian motif). B4a1a1 was described and sequenced separately,, and although its highest frequency and diversity is seen in East Coast PNG and Near Oceania, it is still believed to have first appeared in western ISEA (a region comprising Borneo, the Philippines and Sulawesi) ∼6 000 YBP in a time frame predating the “Polynesian Diaspora”. The appearance of np 14022 was soon followed by the appearance of another transition at np 16247,,. It was proposed to name the motif 16189, 16217, 16247 and 16261 the “Polynesian motif” (now described as B4a1a1a). Most probably, albeit debatably, the first appearance of the Polynesian motif may have taken place in western ISEA. The group of people bearing the motif rapidly dispersed eastward into ISEA 6 300 YBP to 5 500 YBP,,,. After a long sojourn in PNG and near Oceania, ∼3 500 YBP, B4a1a1a spreads all over the Pacific where the “Polynesian motif” was first described. There still remain many problematic linguistic, archaeological, cultural and genetic debates. Today most accepted theories distinguish the fate of Neolithic agriculturists and Austronesian speakers, and propose that Austronesian speakers find their origin in Taiwan–. Some studies have described a rapid eastward dispersal (the “Express train” model) of Austronesian-speaking migrants whose language is ancestral to that of all modern Polynesians ,,. The sequence of event offered in this hypothesis correlates well with the Phylogeography of the “Polynesian motif” (origin and expansion of B4a1a in western ISEA and Taiwan, and then expansion of B4a1a1 and B4a1a1a in near Oceania), but the timing of these events remains questionable. Others, proponents of the “Slow boat” model, used a genetic approach, and showed that most Polynesian lineages derived from a staging post in Wallacea (West Indonesia), pre-established in the early Holocene or before. The same team now proposes a more important early Holocene staging in Near Oceania (8 000 YBP) predating the Lapita cultural complex which appeared in Melanesia and the Pacific islands between 3 600 and 2 900 years ago, and the colonization of the Pacific (data acceptable for publication by Soares et al.). Another alternative suggesting backward migration from Melanesia was discussed by Hagelberg. In the following, only the ISEA eastward migration will be discussed. Figure 2 utilizes two uni-parental systems. The Y-SNP data was obtained from the literature and the mtDNA data was obtained from the Taiwan dataset and the literature ,,,. This data shows that, except for time, “Express train” and “Slow boat” models can be spatially and sequentially compatible.
Eastward gene flow from Taiwan and west island southeast Asian (ISEA), with conservation of the initial mtDNA gene pool (the “Express train” model) and replacement of the initial Y chromosome gene pool (Y penetrance, the “Slow …
The B4ala scenario
The mtDNA scenario in Figure 2 describes the gene fixation of one clade (circle inserted in top right of Figure 2 showing B4a1a). All succeeding mutations occur in a group of peoples who were (or later became) Austronesian speakers and were migrating eastward toward PNG and Polynesia. The model assumes several coastal settlements, occurrence of bottle necks and founding events, and conservation of the initial maternal mtDNA gene pool as expected from a matrilocal society (where females always remained in the same clan). Four stages are shown as follows.
(1) A pre-Neolithic sailing group of Proto-Austronesians, all initially bearing haplogroup B4a1a. B4a1a is a descendant of continental East Asian haplogroup B4a1 and while offshore from mainland Asia has acquired a series of mutations (nps 6719, 12239, 15746, and 16519) that are unique (“fixed”) among insular west Asians (Taiwan and west ISEA). B4a1a is circled in the insert of Figure 2 and its bearers or descendents are represented in Pink in the flotilla.
(2) The first stage of dispersal of B4a1 a shows no mtDNA changes as Taiwan and/or west ISEA populations have similar mtDNA profiles, and women remain in their initial clan.
(3) Approximately 6 200 YBP, most likely within a region including Borneo, South Philippines, and Sulawesi, one of the B4a1a bearers acquired a single mutation (np 14022). In the text, haplogroup B4a1a1 will be referred as the proto-Polynesian motif.
(4) Very shortly after, one B4a1a1 individual acquired np 16247. This new type was first observed by Sykes et al. and Hill et al. in Borneo and Sulawesi respectively and is prominent in PNG. It is now named B4a1a1a or the Polynesian motif (note that nps 14022 and 16247 have never been seen in Taiwan).
On the eastward passage, while B4a1a quickly decreases by drift, B4a1a1 and B4a1a1a successfully continue their eastward dispersal. Coastal Papua New Guinea would have been reached ∼3 000 YBP, and then colonization of the Pacific would have culminated in the discovery of Aotearoa (New Zealand) less than 1 000 YBP, with the arrival of the Maoris. Most importantly, after an important period of expansion in Melanesia, and including the presence of new variants of B4a1a (B4a1a1 and B4a1a1a), the original matrilineal gene pool (stage 1) remained almost identical to the final matrilineal gene pool (stage 5).
In short, matrilineal conservation of the initial gene pool, language and culture is compatible with the concepts pictured by the “Express train” or “Out of Taiwan” models.
The Y Penetrance
The Y chromosome scenario in Figure 2 describes the progression for replacement of the initial paternal gene pool of Austronesian-speaking migrants by locally acquired Melanesian genes. We must recall here that the Y-SNP genetic profile of the TwA and ISEA islanders (haplogroup O and its subtypes) differs greatly from the profile of the Melanesian populations (haplogroups F, G, H, K and C). These Melanesian haplogroups, found in an increasing cline from ISEA to New Guinea and Near Oceania, still bear the Y-SNP signature of the first Paleolithic settlers who initially crossed ISEA, coming directly from Africa and following the southern coastal route. The following scenario is shown in Figure 2:
Stages 1 to 2: The changes in the Y gene pool (blue arrow) are not yet noticeable as the Y-SNP genetic profiles of Taiwan and the Philippines are very similar.
Stages 3 and 4: Two of the 3 migrant haplogroups have been replaced by Melanesian haplogroups while haplogroup 03 still remained. 03 and 01 are frequent in Taiwan and western ISEA. 03 appears to be more successful than 01. Perhaps 03 was predominant among the migrating float of Austronesian speakers. Alternatively 03 may have been retained as the result of drift to the detriment of 01 most likely because of the low number and low Y-SNP polymorphism of the sailing migrants. Interestingly, the introduction of new haplogroups into the migrating clan supports the outcome expected from a matrilocal society on the move (the mothers of the community remain in the clan and have the leading role in determining the movement of males in and out the clan). Here “matrilocal society” is taken in the sense where genetically, heredity is traced through the female line, and where a male who does not come back to the clan after a war or hunting accident, can be replaced by autochthonous Melanesians who later will actively contribute to the continuum of the matrilocal society without altering excessively the initial organization. Their progenies will be completely integrated into the primary structure, as a result the initial Y gene pool will be totally replaced (by Melanesian genes).
Stage 5: The final patrilineal gene pool is different from the original gene pool. Moreover, and unexpectedly, after its departure from PNG, the social organization of the Austronesian has become a patrilocal society but language has remained Austronesian. (This stage constitutes the last stepping stone before the big Polynesian Diaspora into the Pacific.)
In short, sequential and progressive patrilineal loss of the initial NRY gene pool, but not of language and culture, are compatible with the “Slow boat” model.
According to the genetic scenario of Figure 2, opposed models, the cultural “Out of Taiwan” and the genetic “Slow boat” models, happened conjointly. Nonetheless, mean point estimates of the timing of the genetic events remained in conflict with the cultural model (the “Out of Taiwan”). Genetically, the eastward movement of people out of Taiwan/western ISEA does not appear as recent as proposed by the classical 5 500 YBP event for the “Out of Taiwan” . This conflicting aspect with genetics may be resolved if one considers the confidence intervals rather than the point estimates of these calculations. Phylogeographic analysis of mtDNA haplogroup B in East Asia , described a continuous set of events which started with a dispersal of people (between 13 000 YBP and 8 000 YBP) who were bearers of a Taiwan or western ISEA mtDNA haplogroup (B4a1a). Although B4a1a ancestor (B4a1) comes from MSEA, B4a1a has never or rarely been seen in Mainland Asia. The two next descendents of B4a1a (the proto-Polynesian motif in western ISEA, B4a1a1, and shortly after the Polynesian Motif, B4a1a1a) appeared very closely, in succession to each other, within a 95% confidence interval of 3 000 YBP to 12 000 YBP that is in agreement with the estimate of 5500 YBP . At the same time, the distribution pattern of B4a1a haplogroups (and its subtypes) suggested that a matrilineal society (speakers of Malayo-Polynesian languages) reached coastal Papua New Guinea 3 500 YBP to 2 500 YBP during the Lapita period where they rapidly expanded. The colonization of the rest of the Pacific islands took place during the next 2 500 years. In this scenario, characterization of historical events estimated from genetic data are crude approximations resulting from the influence of reproductive patterns, isolation, genetic mutation, population admixture, drift, founder effect, and expansion and divergence lag times. Actually, the 95% confidence intervals obtained incorporates the cultural model but the genetic scenario still appears to antedate the time generally accepted by “Out of Taiwan” model. A better fit can be obtained if firstly allowing for a demographic lag time (the time necessary for the establishment of an effective population size varying from 400 to 2 000 years), and secondly, allowing for the time in which a new mutation may reach fixation (in most cases ∼1 000 years).
In any case, the sequence of genetic events presented in this study corresponds with archaeological and linguistic observations, and supports the suggestion that the main line of mutations of mtDNA (the eastward gene flow with the mutation of haplogroup B4a1a to B4a1a1 and then to B4a1a1a), the main line of linguistic patterns (Formosan to Malayo-Polynesian to Polynesian languages) and the cultures affinities (Taiwan “horticulturalists” ceramic Coarse Corded Ware culture to Lapita potteries), reflect a sound maternal dispersal in ISEA that is independent to geographical distances, and is overlaid by a continuously changing male-biased gene flow. As a conclusion, it appears that the genetic lay out was already established when new cultural processes (the spread of people from western ISEA, their Austronesian languages, pottery wares, and so on) started their eastward spread toward Near Oceania. It is only during the conquest of the Pacific in the last 2500 years that genes and culture correlate.
Haplogroups F1a1a and M7c3c
MSEA is mostly populated with Daic- (in the southeast) and Austro-Asiatic–(throughout Indochina) speaking populations. The most common haplogroups among Daic are B4a, F1a, M7b1, B5a, M7b*, R9a, R9b, M7c, and other undefined M* in order of frequency, totaling 48.8%. Among Austro Asiatic speakers, the most common haplogroups in order of frequency are F1a, M*, D*, F1b, N*, C, M7b*, M7b1, F1a1, M7c and B4a,. Noticeably, the two regions share low frequency sub-haplogroups of B4a*, F1a1* and M7c* which are also seen in Insular Asia (the star meaning “including other subclade determinants”). This indicates that Austro-Asiatic speakers, Daic and Insular Asia islanders (TwA, Filipino and Indonesians), share deep ancestry most likely dating more than 20 000 YBP. Indeed, we have just described B4a1a in ISEA that descended from MSEA haplogroup B4a1 whose coalescence age in MSEA would date ∼29 000 YBP.
In their phylogeographic reconstruction,, researchers proposed a bidirectional move ment of people from MSEA toward insular Asia via either the Taiwan straight, or southward to western ISEA along the Indochinese peninsula and Indonesia. These two processes would then later join in an eastward migration toward PNG.
Haplogroup F1a1a was initially defined by Hill, as a daughter clade of F1a1 (Figure 3). Dating estimate of this clade indicates a candidate for both postglacial and Neolithic dispersals. F1a1a, defined by nps 8149 and 16108, provides a distinctive patterns [shown as F1a1a (Ind) in Figure 3]; it is seen in both South China and Indochina, having first appeared 5 000 YBP to 10 000 YBP. It is most common among in Indochina and among some of the indigenous groups of peninsular Malaysia,. Trejaut et al. and Tabbada et al. saw a sister clade of F1a1a, here defined by nps 11380 and 16399 and named F1a1d(Tw). F1a1d(Tw) is found in MSEA, North Vietnam, Fujian and Taiwan (Figure 3). Neither F1a1a (Ind) nor F1a1d(Tw) is seen in the Philippines or among north TwA. The presence of other subclades of F1a1 in several regions of MSEA, Indochina and Japan indicates that MSEA (having the highest diversity of F1a1) is most likely the site of origin of F1a1. It is from there that the two sister clades F1a1d(Tw) and F1a1a(lnd) must have left MSEA 9 000 YBP and separately reached Taiwan and western ISEA respectively.
Haplogroup F1a1a and M7c3c distribution. Distribution of F1a1d (Tw) is shown in blue and F1a1a (Ind) is shown in orange (top of Figure 3). The overlapping of the two distributions suggests a probable origin of precursor F1a in mainland Southeast Asia …
Haplogroup M7c3c, dating to ∼8 000 YBP (Figure 3), is not seen in MSEA. The presence of its sister clades and that of its direct ancestor (M7c3) in MSEA and Japan indicates a late Pleistocene origin of the M7c ancestral clade on the East Asian continent. The distribution of M7c3c (Figure 3) throughout Taiwan and ISEA correlates with the spread of the Austronesian speakers, but the spread only reached Near Oceania and did not expand to Polynesia. There is a lot of variation among the sister branches of M7c3c, most interestingly, these subtypes are not shared between regions nor do any branches indicate later subsequent migrations. This probably indicates that after an initial expansion, M7c3c was rapidly distributed throughout ISEA and remained isolated till present time, a period which allowed diversity to develop locally. The higher frequency of M7c3c in Taiwan and the Philippines than in Indonesia would support a dispersal model similar to the “Out of Taiwan” model. Nonetheless these two factors are not sufficient to determine the origin of the first M7c3c. Actually except for the distribution of M7c3c in Taiwan and Indonesia, the highest frequency and diversity of M7c3c in the Philippines could also indicate a bidirectional gene flow of M7c3c from North and South into the Philippines from a location (in MSEA) that has now lost M7c3c by drift.
In the two preceding paragraphs we first saw that F1a1d(Tw) and F1a1a(lnd) showed opposed directional gene flows (North and South respectively) that reached Taiwan and Indonesia, but did not reach the Philippines. The tracing of these routes on a map clearly delineates a demographic pincer model that could have started in pre-Holocene era in MSEA. Secondly, we saw that an origin of M7c3c could not be localized but it was clearly shown that its distribution covered the whole western ISEA (ending in Near Oceania) and that the origin of its founder (M7c3) most likely located in MSEA. As for F1a1d(Tw) and F1a1a(lnd), the subtypes of M7c3c were distinct between regions and the higher diversity in the Philippine supports the demographic pincer model of distribution just mentioned. This model does not oppose the B4a1a model of distribution which we proposed initially. Actually, the B4a1a model followed much more closely the M7c3c distribution as all subtypes of B4a1a were sedentary in western ISEA except for one (B4a1a1 and later B4a1a1a) whose demography can be retraced further into the Pacific and much later in the Indian Ocean.
In support of this pincer model of distribution, Li et al. used human Y-SNP to show that Taiwanese and Indonesians were derived from MSEA populations (see Figure 2 of reference ). Also, using Y-SNP Karafet et al. and Trejaut et al. (materials in preparation) were able to estimate a date of the demographic branches of the pincer model. For this, they used the polymorphism of Y short tandem repeats acquired in the background of each Y-SNP haplogroup: O1a*, O1a1*, O3a*, O3a3* and O3a4*. They showed that the longest isolation of Taiwan or ISEA from a single founder haplogroup in MSEA dated between 12 000 YBP and 20 000 YBP. The upper range of these dates appears older than dating obtained from mtDNA lineages,, and could be due to the slower rate of mutation of Y-SNP. Alternatively the older dating could also reflect a period of expansion (a lag time) of these Y-SNP haplogroups in MSEA when people were awaiting more favorable climatic conditions for their opposite migrations to Taiwan and ISEA.
Finally more support to the pincer model is given by a large-scale survey of autosomal variation from a broad geographic sample of Asian human populations,. The study (only based on phylogeography but not on time) showed that the di stribution of populations throughout insular Asia was strongly correlated with linguistic, genetic affiliations as well as geography by showing that gene flow from SEA constituted a major geographic source of all insular Asian populations.
Considerable differentiations between populations of East Asia and ISEA have been genetically determined. Using mtDNA and non recombining Y chromosome, some of these genetic differentiations could be dated back to the out of Africa era 60 000 YBP, a time representing the origin of all extant populations in the northern hemisphere. A time also when anatomically modern humans were already carriers of Epstein-Barr virus (EBV) and when one do not know if the Oncogene region on EBV DNA was differentiated, active or not active. The distribution of southern Asian populations throughout the world has been associated with NPC and with that of specific type of EBV. It is possible that the mutated form of EBV associated with NPC occurred 45 000 ago when people from the horn of Africa reached Sundaland and dispersed North and East. The northern group, later the mongoloids, under climatic and geographic constraint, remained isolated for more than 30 000 years. This period (of selective pressure) gave ample time for the EBV strain among Asian people to evolve as a lineage distinct from those of Papua New Guinea, Australian or isolates of the rest of the world. NPC due to EBV is rare in Europeans but common in southern Chinese who, even in the farthest-reached regions (New Zealand, Easter Island or even Africa) are associated to presence of NPC among the autochthones (e.g. the Maoris people, the Moroccans). Although susceptibility loci have been mapped within the human genome, the etiological factors associated with EBV are remarkable. It is possible that it is this association with a specific EBV strain and genes of the human genome that should be studied further.
Major East–West Division Underlies Y Chromosome Stratification across Indonesia
Tatiana M. Karafet1, Brian Hallmark1,2, Murray P. Cox1, Herawati Sudoyo3, Sean Downey2, J. Stephen Lansing2,4 and Michael F. Hammer*,1,2
+ Author Affiliations
1ARL Division of Biotechnology, University of Arizona
2School of Anthropology, University of Arizona
3Eijkman Institute for Molecular Biology, Jakarta, Indonesia
4Santa Fe Institute, Santa Fe, New Mexico
*Corresponding author: E-mail: email@example.com.
The early history of island Southeast Asia is often characterized as the story of two major population dispersals: the initial Paleolithic colonization of Sahul ∼45 ka ago and the much later Neolithic expansion of Austronesian-speaking farmers ∼4 ka ago. Here, in the largest survey of Indonesian Y chromosomes to date, we present evidence for multiple genetic strata that likely arose through a series of distinct migratory processes. We genotype an extensive battery of Y chromosome markers, including 85 single-nucleotide polymorphisms/indels and 12 short tandem repeats, in a sample of 1,917 men from 32 communities located across Indonesia. We find that the paternal gene pool is sharply subdivided between western and eastern locations, with a boundary running between the islands of Bali and Flores. Analysis of molecular variance reveals one of the highest levels of between-group variance yet reported for human Y chromosome data (e.g., ΦST = 0.47). Eastern Y chromosome haplogroups are closely related to Melanesian lineages (i.e., within the C, M, and S subclades) and likely reflect the initial wave of colonization of the region, whereas the majority of western Y chromosomes (i.e., O-M119*, O-P203, and O-M95*) are related to haplogroups that may have entered Indonesia during the Paleolithic from mainland Asia. In addition, two novel markers (P201 and P203) provide significantly enhanced phylogenetic resolution of two key haplogroups (O-M122 and O-M119) that are often associated with the Austronesian expansion. This more refined picture leads us to put forward a four-phase colonization model in which Paleolithic migrations of hunter-gatherers shape the primary structure of current Indonesian Y chromosome diversity, and Neolithic incursions make only a minor impact on the paternal gene pool, despite the large cultural impact of the Austronesian expansion.
Y chromosome Paleolithic colonization Austronesian Wallace’s line
Indonesia, the world’s largest archipelago, is a chain of more than 17,000 islands that stretches between the continents of Asia and Australia, dividing the Pacific and Indian oceans. The ∼240 million inhabitants are extremely diverse, speaking more than 750 languages and representing >300 different ethnic groups. Within this extreme diversity, several large-scale patterns give clues to the early settlement history of the region. Early explorers noticed morphological differences from east to west that were dramatic enough to lead Alfred Russell Wallace to designate a human phenotypic boundary demarcating the transition between Asian and Melanesian features. Relative to his more well-known biogeographic boundary, this line lies slightly east, running between the islands of Sumbawa and Flores (Wallace 1869). The languages of the region follow a similar pattern, with the majority belonging to the extensive Austronesian language family but with more distantly related Papuan languages occurring in the Far Eastern provinces, especially in areas where Melanesian features predominate (Wallace 1869; Howells 1973; Pietrusewsky 1994; Cox 2008).
To explain these patterns, the prehistory of this region has often been framed as the story of two major range expansions: the initial Paleolithic colonization of Sahul ∼45 ka ago (Kirch 2000; Roberts et al. 2001; Leavesley and Chappell 2004; O’Connell and Allen 2004; Barker et al. 2007) and the much later Neolithic expansion of Austronesian-speaking farmers (4–6 ka ago) out of mainland Asia or Taiwan into Indonesia and the Pacific (Kirch 1997; Diamond 2000; Gray and Jordan 2000; Su et al. 2000; Capelli et al. 2001; Hurles et al. 2002; Kayser et al. 2003, 2006; Karafet et al. 2005, 2008; Li et al. 2008). In this scenario, the Austronesian expansion shaped the primary genetic, linguistic, and cultural diversity of the region, and the distribution of Papuan languages and phenotypic features are remnants of the initial colonization, which survived for thousands of years amid significant climatic changes and cultural shifts.
Although appealing in its simplicity, this two-phase model is inconsistent with information from numerous sources that point to a far more complex history for the region. Recent work on human mitochondrial DNA (mtDNA) suggests that the majority of the region’s maternal gene pool has a pre-Austronesian origin and that the distribution of mtDNA haplogroups is better explained by climatic and sea-level changes following the Last Glacial Maximum rather than the expansion of farmers out of mainland Asia and/or Taiwan (Hill et al. 2007; Soares et al. 2008). Ethnobotanical and linguistic evidence suggests a significant pre-Austronesian westward dispersal of bananas and their cultivators from New Guinea into eastern Indonesia and possibly even further west (Denham and Donohue 2009). Work on pig mtDNA points to multiple distinct migrations not only eastward out of Southeast Asia but also within Wallacea itself (Bellwood and White 2005; Larson et al. 2005; Lum et al. 2006). These data suggest that the pigs of Melanesia and Oceania trace their maternal origin to Southeast Asia rather than to Taiwan, which has been proposed as the place of origin of the Austronesian language family based on linguistic diversity (Blust 1995; Bellwood 1997).
Even within linguistics, the source of the Austronesian languages remains controversial, and dispersals both into and out of Taiwan are still debated (Meacham 1984; Gray and Jordan 2000; Diamond and Bellwood 2003). In addition, recent small-scale studies (Lansing et al. 2007, 2008) have added to our understanding of the contact dynamics that occurred as Austronesian speakers moved into places where preexisting Papuan populations already lived. More recently, Indonesia, especially in the west, has experienced ever-increasing spheres of Eurasian influence, which have all led to major cultural changes, including trade with India and the establishment of Hindu kingdoms (from ∼2.5 ka ago), the arrival of Arab traders and Islam from the Near East (∼1 ka ago), and European contact within the past 500 years.
Although Indonesian populations have been surveyed in many previous genetic studies (Kayser et al. 2000, 2001, 2003, 2006; Hurles et al. 2002; Redd et al. 2002b; Karafet et al. 2005; Li et al. 2008; Mona et al. 2009), there are still many open questions regarding the settlement history of the archipelago. One challenge is the sheer size and complexity of the Indonesian region, which makes comprehensive sampling difficult. To better characterize the paternal genetic landscape and to further disentangle the complex demographic history of the region, we present the largest survey of Indonesian Y chromosome diversity to date. In 2003, we established a close collaboration with the Eijkman Institute for Molecular Biology in Jakarta and have since sampled 1,917 men from 32 locations across Indonesia. Here, we report the results of genotyping these samples with an extensive battery of Y chromosome markers, both single-nucleotide polymorphisms (SNPs, n = 85) and short tandem repeats (Y-STRs, n = 12). Included in our set of SNPs are three markers (P201, P203, and JST002611) that have not been surveyed in this geographic region thus far and provide increased phylogenetic resolution of two of the major lineages (haplogroups O-M119 and O-M122) associated with the Austronesian expansion. In addition, our database contains the largest number of samples yet reported (n = 873) from the eastern Indonesian province of Nusa Tenggara Timur, which is notable as a contact zone with high linguistic and cultural diversity.
Subjects and Methods
Our Indonesian sample comprised 1,917 males from 32 communities on 13 islands, including Sumatra (n = 38), Nias (n = 60), Mentawai (n = 74), Java (n = 61), Borneo (n = 86), Bali (n = 641), Sulawesi (n = 54), Flores (n = 394), Lembata (n = 92), Sumba (n = 350), Alor (n = 28), Timor (n = 9), and a composite group from the Maluku Islands (n = 30). For many of these islands, the samples were collected at multiple traditional villages but have been pooled together in this study for a broad-scale analysis (supplementary table S1, Supplementary Material online). Buccal swabs were collected from volunteers by HS and/or JSL from 2003 to 2007 with informed consent by the donors. All sampling procedures were approved by the University of Arizona Human Subjects Committee, Balai Pengkajian Teknologi Pertanian (Bali), and the Government of Indonesia. For comparative analysis, we included an additional 763 previously reported samples from our database, representing 19 populations across Southeast Asia and Oceania (Hammer et al. 2001; Karafet et al. 2001, 2005; Redd et al. 2002b). Figure 1 shows a map of the sampling locations, and additional information can be found in supplementary tables S1 and S2 (Supplementary Material online).
View larger version:
In this page In a new window
Download as PowerPoint Slide
Map showing the approximate geographic positions of the population samples included in this survey. The size of each circle is proportional to the size of the sample. The populations are grouped (within dotted lines) into four major geographic areas (three-letter codes refer to populations sampled in this survey): Southeast Asia (SEA, including both mainland and island populations: HAN, Han Chinese; TUJ, Tujia; MIA, Miao; YAO, Yao; SHE, She; VIE, Vietnamese; MAL, Malaysians; TAB, Taiwanese aboriginals; and PHI, Filipinos), western Indonesia (SMT, Sumatra; NIA, Nias; MEN, Mentawai; JAV, Java; BLI, Bali; and BOR, Borneo), eastern Indonesia (FLO, Flores; SUM, Sumba; LEM, Lembata; ALO, Alor; TIM, Timor; SLW, Sulawesi; and MOL, Moluccas), and Oceania (NGC, Coastal Papua New Guinea; NGH, Highland Papua New Guinea; VAN, Vanuatu; NAS, Nasioi; MIC, Micronesia; ASA, American Samoa; WSA, western Samoa; TON, Tonga; and RNU, Rapa Nui/Easter Island and TAH, Tahiti not shown). The distribution of Austronesian and Papuan languages is shown in light and dark shades of gray, respectively.
Polymorphic sites from the nonrecombining portion of the human Y chromosome (NRY) included a set of 82 binary markers published previously (Karafet et al. 2005) together with a set of three new polymorphisms, JST002611, P201, and P203 (Karafet et al. 2008), that have not been typed at this scale before. The phylogenetic position of these markers and the haplogroups they define in our samples are shown in figure 2. A hierarchical genotyping strategy was used in which major haplogroups were predicted based on the array of Y-STR alleles contained on each Y chromosome (Schlecht et al. 2008) and then confirmed by genotyping of a smaller set of SNPs. Once the correct major haplogroup was identified, additional genotyping was restricted to the appropriate downstream mutations along the haplogroup tree (fig. 2). We follow the nomenclature recommended by the Y Chromosome Consortium (YCC 2002; Karafet et al. 2008), focusing mainly on the mutation-based naming system. Potentially paraphyletic paragroups are distinguished from haplogroups by the asterisk symbol. We also analyzed 12 Y-STRs (DYS19, DYS385a, DYS385b, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS426, and DYS439) using methods described by Redd et al. (2002a).
View larger version:
In this page In a new window
Download as PowerPoint Slide
Maximum parsimony tree of 35 Y chromosome haplogroups together with their frequencies in four geographic groups. Sample sizes for each region are mainland Southeast Asia (SEA), 581; western Indonesia (WIN), 960; eastern Indonesia (EIN), 957; and Oceania (OCE), 182. Major clades (i.e., C–S) are labeled with uppercase letters to the left of each clade. Mutation names are given along the branches. Dotted lines indicate internal nodes not defined by downstream markers (i.e., paragroups). The names of 35 haplogroups observed in our samples are shown to the right of the branches using the mutation-based nomenclature of the YCC (2002). Lineage-based nomenclature information is provided next to internal branches for key O and M haplogroups. Haplogroup frequencies (shown to the far right) are weighted to normalize the contribution of each population. (Hyphen indicates 50% on Java and Bali) but is virtually absent east of Flores. This paragroup is widespread in south and southeast Asia (Karafet et al. 2005), and it is unclear when it initially entered western Indonesia. The ancient time to the most recent common ancestor of this lineage and Asian distribution supports the hypothesis of a pre-Austronesian incursion of O-M95* into western Indonesia from Southeast Asia (Kumar et al. 2007). It has also been suggested that O-M95* chromosomes appeared in Indonesia after the initial colonization of the Pacific by Austronesian farmers and that the high frequency of O-M95* in Bali and Java may reflect an even more recent influx of males from the Indian subcontinent (e.g., possibly concomitant with the spread of Hinduism and the establishment of Indian kingdoms in the first millennium) (Karafet et al. 2005).
O-M122, a haplogroup with a wide Southeast Asian distribution, has been proposed as a marker of the spread of Austronesian-speaking populations (Kayser et al. 2000, 2006; Capelli et al. 2001; Karafet et al. 2005; Shi et al. 2005; Scheinfeldt et al. 2006). The typing of additional SNPs within the O-M122 lineage results in a much more fine-grained picture of the distribution of this clade in mainland and island Southeast Asia. High frequencies of the derived O-M134, O-M7, and O-002611 subclades are observed in different ethnic groups in China and mainland Southeast Asia. In contrast, these haplogroups are absent or only marginally present among Indonesian, Taiwanese aboriginal, and Pacific populations. The low frequency of O-M7 in western Indonesia (i.e., Bali, Java, and Borneo) and O-M134 in Polynesia most likely reflects recent connections with mainland China (see next section).
More notable are the results of typing the novel marker P201, which has the effect of converting almost all chromosomes outside of mainland Asia that were previously identified as O-M122* to O-P201*. Given this widespread pattern, it may be that O-M122 chromosomes previously found at high frequency on many Pacific Islands (Kayser et al. 2006) are actually O-P201* (in our small sample of 64 Polynesians typed here, 11 of 15 O-M122 carriers are also O-P201*). Unlike the lineages discussed so far, the frequencies of O-P201* chromosomes are fairly constant (∼7%) right across the major regions sampled here (fig. 2). Importantly, despite its relatively low frequency on Taiwan (∼6%), genetic distances based on STR variation associated with P201 chromosomes reveal a much closer relationship among Taiwanese aboriginals/Filipinos, Indonesians, and Oceanians than between any of these groups and mainland Southeast Asians (supplementary table S4, Supplementary Material online). Therefore, we hypothesize that this new marker traces the large population expansion associated with the spread of Austronesian languages and culture.
With regard to the hypothesis of a Taiwanese origin of the Austronesian people (Bellwood 2007), the M119 mutation has drawn particular interest because these chromosomes are dominant among Taiwanese aboriginal groups (Su et al. 1999, 2000; Kayser et al. 2000, 2001, 2003, 2006, 2008; Capelli et al. 2001; Hurles et al. 2002; Karafet et al. 2005; Li et al. 2008). Previously, we found little geographic structure for O-M119* STR haplotypes and proposed that the absence of such structure might reflect an ancient dispersal, migrations from different source populations, and/or sustained gene flow from Southeast Asia (Karafet et al. 2005). We also suggested that O-M119* chromosomes might represent a heterogeneous group of not-yet-identified haplogroups. Here, we have substantiated the second claim and find that the additional mutation P203 allows significantly better geographic resolution of O-M119 chromosomes than previously possible. Many (but not all) chromosomes that were previously identified as O-M119* are now O-P203, including those from the Taiwanese aboriginals sampled here (i.e., all 34 M119 chromosomes are marked with the derived P203 mutation). The current results reveal that while the “ancestral” O-M119* lineage is virtually absent in mainland Southeast Asia, the derived O-P203 subclade is frequent there, as well as in western Indonesia (fig. 2).
The high frequency of O-P203 in western Indonesia (∼24%) and much lower frequency in eastern Indonesia (∼2%) and Oceania (<1%) (fig. 2) are not easily explained by a model that links its spread solely with the Austronesian expansion. A highly reticulated MJ network based on Y-STR diversity is uninformative on the question of a Taiwanese affinity of Indonesian O-P203 chromosomes (data not shown). Genetic distances based on O-P203 Y-STR haplotypes are equally similar between Taiwanese aboriginals and Indonesians (RST = 0.151) and Taiwanese aboriginals and mainland Southeast Asian populations (RST = 0.156). However, sharing of 12-locus Y-STR haplotypes associated with P203 chromosomes was found between Taiwanese aboriginals and Indonesians (i.e., from Nias, Mentawai, Java, and Bali), whereas no such sharing was found between the Taiwanese aboriginal and mainland Southeast Asian P203 chromosomes in our sample. This suggests that some portion of Indonesian O-P203 chromosomes may have migrated from Taiwan.
A stronger case can be made for a Taiwanese affinity of Indonesian O-M110 chromosomes (Karafet et al. 2005; Kayser et al. 2008). An MJ network of 12-locus Y-STR haplotypes associated with O-M110 chromosomes has a central node composed of 2 Taiwanese aboriginal and 14 western Indonesian chromosomes (fig. 4). However, although present in ∼19% of Taiwanese aboriginals sampled here, O-M110 chromosomes are found at low frequencies elsewhere (with the exception of Nias), with only 1 of 182 samples in Oceania belonging to this haplogroup (fig. 2, supplementary table S2, Supplementary Material online). In fact, a larger number of O-M119* chromosomes was found in Oceania (4 of 182). Thus, while it is possible that haplogroups O-M110 and O-P203 mark an expansion out of Taiwan, it is also possible that at least part of their distribution, especially in western Indonesia, reflects a pre-Austronesian dispersal of these haplogroups in island Southeast Asia. In this way, these lineages are reminiscent of mtDNA haplogroup E whose distribution in Taiwan and island Southeast Asia has been suggested to predate the Austronesian expansion (Hill et al. 2007; Soares et al. 2008).
View larger version:
In this page In a new window
Download as PowerPoint Slide
Median-joining network for haplogroup O-M110 based on variation at 12 Y-STRs. Haplotypes are represented by circles with the area proportional to the number of individuals carrying that haplotype. Branch lengths are proportional to the number of one-repeat mutations separating two haplotypes. Color coding of haplotypes: black (Taiwanese aboriginals), cross-hatched (eastern Indonesians), gray (western Indonesians), and white (other).
Y Chromosomes Entering Indonesia in Historic Times
The remaining Indonesian chromosomes represent a range of haplogroups and account for only ∼6% of the total sample. Some of these lineages have clear affinities with distant geographic regions and may mark incursions into Indonesia in recent times. For example, Indian contact <2.5 ka ago may have introduced lineages within haplogroups H (H-M69, H-Apt, and H-M52), R (R-M124), and Q (Q-M346) to Indonesia (which accounts for ∼2% of Indonesian Y chromosomes; supplementary table S2, Supplementary Material online). Interestingly, H-M69, H-Apt, and Q-M346 are observed only in Bali, which has the highest frequency of Indian Y chromosomes (Kivisild et al. 2003; Karafet et al. 2005; Gutala et al. 2006; Sengupta et al. 2006), whereas Y chromosomes with Indian affinity (e.g., H-M52 and R-M124) are also found at lower frequencies in Java, Borneo, and Sumatra.
Although it is more difficult to pinpoint the source of J-M304, L-M20, and R-M17 lineages because they are present at relatively high frequencies in both Indian and Near Eastern populations (Karafet et al. 2005), it is plausible that some of these lineages entered Indonesia recently with the spread of Islam. Overall, Indian and Arab influences are restricted to western Indonesia, particularly the adjacent islands of Java and Bali, where Indian and Arab cultural influences are self-evident (Lansing 1983). Chinese influences occur here too (e.g., O-M7 is present at <1% in Bali and ∼11% in Java); however, they are more important in Borneo (e.g., O-M7 is found at ∼20%), a major Chinese outpost dating from the Han dynasty (Ricklefs 1993; Taylor 2003).
A Four-Stage Colonization Model for the Region
To integrate our phylogeographic inferences for the diverse set of haplogroups and populations surveyed here, we now formulate a four-stage colonization model that attempts to account for the current pattern of Y chromosome variation in Indonesia (fig. 5). In the first stage, a Late Pleistocene arrival of the first anatomically modern settlers introduces basal C and K lineages to the entire region (which eventually give rise to haplogroups C-M38, M-P256, and S-M230 in eastern Indonesia/Melanesia). At the time of this expansion (∼45–50 ka ago), sea levels were much lower and the shape of the coastline was very different from that of today (fig. 5A). For example, Sumatra, Java, Borneo, and other small groups of islands formed a direct extension of the Asian mainland—the Sunda continental shelf or Sundaland—and the deep-water channel separating Bali and Lombok marked the end of this land mass. Further travel eastward to Wallacea and Sahul required crossing water.
View larger version:
In this page In a new window
Download as PowerPoint Slide
A multistage colonization model for Indonesia. (A) Initial wave of colonization 40–50 ka ago, (B) Paleolithic contribution from mainland Asia, (C) Austronesian expansion, and (D) migration in historic times (see text for details). Haplogroups and paragroups listed in each panel are postulated to have arrived during this stage. Arrows do not denote precisely defined geographic routes. The dotted arrowhead in panel (B) indicates possible bidirectional gene flow. The dotted line in panels (B), (C), and (D) represents Wallace’s biogeographic boundary. Arrows in panel (C) adapted from Gray and Jordan (2000). Small letters associated with arrows in panel (D) refer to migrations from India (i), Arabia (a), and China (c). Shaded areas with capital letters (A–D) under panels refer to approximate time frame for each stage of colonization. Coastlines in panels (A) and (B) are drawn with sea levels 50 and 120 m below current levels (C and D) (http://www.fieldmuseum.org/research_collections/zoology/zoo_sites/seamaps/mapindex1.htm).
Climate change during, and at the end of, the last glacial period (33–16 ka ago) may have had an important effect on human diversity in the region, including an overall population decline during this period (Bird et al. 2005; Pope and Terrell 2008). After 19 ka ago, the sea level began to rise again, with Southeast Asia reaching its present coastline by around 8 ka ago (Mulvaney and Kamminga 1999). These climatic changes may have spurred a second round of expansion of hunter-gatherers into Sundaland from further north on the mainland (Soares et al. 2008). Indeed, the spread of the Southeast Asian Hoabinhian culture into Sumatra may be one tangible marker of these movements (Bellwood 2007).
We posit that dispersals of hunter-gatherers radiating over an extended period of time (e.g., 8–35 ka ago) introduced several major subclades of haplogroup O to Indonesia (e.g., O-M119, O-M95, O-P203, and O-M122) (fig. 5B). However, the current data do not inform us about the age of the sharp Y chromosome boundary between western and eastern Indonesia. Cox et al. (2010) genotyped a small set of ancestry informative markers on the autosomes and X chromosome and found a similar transition from Asian to Melanesian ancestry over a narrow geographic region in eastern Indonesia. Although clines that extend over long distances and that originate from well-differentiated source populations may be remarkably stable (Wijsman and Cavalli-Sforza 1984; Cavalli-Sforza et al. 1994; Fix 1999), it is not clear how divergent human groups practicing similar subsistence strategies and living in close geographic proximity could have remained (semi-) isolated since the Late Pleistocene or why they may have done so. An alternate explanation is that the boundary is fleeting and formed recently through the mixing of groups that shared ancient common ancestry. Future studies of many more genome-wide markers should help to determine whether this ancestry cline reflects contact between differentiated hunter-gatherer groups in the Late Pleistocene (i.e., during this second stage of colonization in fig. 5B) or more recent mixing resulting from the spread of farming populations.
The third stage of colonization corresponds to the Austronesian expansions. This maritime dispersal of rice agriculturists from southern China/Taiwan, beginning between 5.5 and 4.0 ka ago, resulted in the expansion of Austronesian languages throughout the region (Bellwood 2007). We posit that it also led to the migration of haplogroups O-P201 and possibly O-M110 and O-P203 to both sides of Wallace’s line as it penetrated the Indonesian region from the north by sea (fig. 5C). The final phase of settlement involves several incursions, especially into western Indonesia, during historic times (fig. 5D). The first of these is the spread of Hinduism and the establishment of Indian kingdoms, which took place between the 3rd and 13th centuries (Peter 1982; Karafet et al. 2005). This resulted in the introduction of multiple haplogroups that derive from south Asia and today are found at low frequency in Bali, Java, Borneo, and Sumatra. The second is the spread of Islam (Tibbetts 1979), ultimately from Arabia, which may have introduced paternal lineages within haplogroups J, L, and R. Finally, haplogroup O-M7 is a marker of recent Chinese influence.
Although the initial colonizers entered a landscape previously unoccupied by anatomically modern humans, subsequent expansions occurred into territory already inhabited by genetically differentiated groups. However, there is no evidence that any of these expansions resulted in a complete replacement of the Y chromosomes of previous inhabitants. This is evidenced by the survival of older genetic strata in both western and eastern Indonesia. Likewise, haplogroup O lineages associated with the Austronesians are only found at <20% in western Indonesia and less frequently in eastern Indonesia and Melanesia, despite the fact that Austronesian languages predominate throughout most of the region. Finally, the migration processes that led to the most dramatic cultural and social change, the “Indianization” and “Islamization” of Indonesia, resulted in the smallest amount of genetic change (i.e., they only account for a small percentage of Indonesian Y chromosomes). Interestingly, the earliest settlers left the most durable pattern in the Y chromosome data: a sharp transition from Asian O haplogroups to Melanesian haplogroups (C, M, and S) over a small area in eastern Indonesia. From a Y chromosome perspective, more recent incursions of culture occurred without much of a corresponding genetic effect.