Southern African Genome Diversity Study

PAST PROJECT

Overview

Scientists at the J. Craig Venter Institute have a long history of leadership in human genomic research. Since the work in the early 1990's by Dr. Venter and colleagues at the National Institutes of Health in describing expressed sequence tags (ESTs) to rapidly discover human genes, to the sequence and analysis of the first draft human genome published in 2001, to the first complete diploid human genome published in 2007, JCVI scientists remain committed to research into the human genome.

In 2010 the JCVI Human Genomic Medicine Group was strengthened with the addition of geneticist, Vanessa Hayes, PhD, who joined JCVI as Professor of Genomic Medicine. Dr. Hayes, a native of South Africa, and her team (who were then at the University of New South Wales and the Children's Cancer Institute of Australia) were part of the seminal South African Genome Project. They, along with lead collaborators from the Pennsylvania State University and nearly 50 other scientists from nine other academic institutions, published their work in the journal Nature in February 2010. The researchers sequenced the complete genomes (and an additional four exomes) of a Bushman and a Bantu, the latter happening to be Archbishop Desmond Tutu.

The research, which garnered world-wide attention, was key to enhanced understanding of the extent of uncaptured human genetic variation, and the relevance of this variation in assessing disease risk, outcomes and the response to various medicines, in particular within the understudied populations of southern Africa. The work also gives insights into aging since all participants were in their late 70's. The study identified 1.3-million genetic variants that scientists previously had not observed. These genetic variations reveal that Southern Africans are quite distinct genetically from Europeans, Asians, and West Africans.

Ju/'hoan hunters setting a trap. Photo courtesy Chris Bennett.

Dr. Hayes was a crucial component to getting this project underway. It was born from her frustration in earlier human genomic studies when she found a complete lack of African reference genomes and susceptibility gene array profiles in existing databases. Africa, believed to be the birthplace of mankind with the oldest populations, offers a much greater diversity than found in individuals of European decent.

Dr. Hayes' passion to see African genomes have their rightful place in the world-wide databases continues to this day and is reflected in the work she and her team are doing at JCVI. They are working on a variety of projects one of which involves a formal collaborative agreement with the University of Limpopo (UL) in South Africa.

In May 2011 JCVI signed a memorandum of understanding with UL to expand ongoing research collaborations on a variety of levels including in human genomics and prostate cancer in indigenous African populations. The collaboration was named the "University of Limpopo (UL) — J. Craig Venter Institute (JCVI) Genomics Network".

Dr. Hayes, and UL's Philip Venter, Professor in the Faculty of Medical Sciences, Turfloop Campus, have had a long-standing collaboration since both were involved in the Southern African Genome Project. The new UL-JCVI Genomics Network will expand on this fruitful collaboration by formally enabling researchers at both institutions to utilize and learn from the experience and expertise of their colleagues. [Nature. 2011 Aug 11; 476: 152.]

Specific goals and projects established within the framework of the UL-JCVI Network include:

  • Defining the extent of genomic diversity in the Khoesan and Bantu populations of Southern Africa and establishing a genomic profile of prostate cancer disparities within South African populations
  • Promoting the exchange of scientific ideas, information and technology as it relates to improving the understanding of human genetic diversity and genomic medicine (not excluding possible pathogen related genomic relevance), and
  • Ensuring the protection of all human subject participants in the clinical research program by providing thorough and complete ethical review of the Research Studies.

The Network will also facilitate faculty and student exchanges, hosting visiting scholars and scholars in residence, and joint project, proposal and scientific manuscript development.

Principal Investigator

Vanessa Hayes

Analytical Team

Nicholas Schork
The SCRIPPS Research Institute, San Diego

Desiree Petersen
JCVI, San Diego

Ondrej Libiger
The SCRIPPS Research Institute, San Diego

Elizabeth Tindall
JCVI, San Diego

International PhD Students and Visiting Scientists

Rae-Anne Hardie
University of New South Wales, Australia

Zolani Simayi
University of Limpopo, South Africa

Katherine Theron
University of Limpopo, South Africa

Related Research

PAST PROJECT

Study Goals

The overall goal of this study is to establish the true extent of human genome diversity, which includes the largely understudied indigenous populations of the world. It is critical that all peoples who desire  to be part of the genomic era are provided with this opportunity to be included in research studies, genomic databases to reap the benefits of enhanced knowledge from genomics which is  are shaping the future of medicine and our society. [Science (New York, N.Y.). 2011 May 06; 332: 639.]

The inclusion of the Archbishop Emeritus Desmond Tutu in the first South African Genome project was important for so many reasons. His participation ensured that many more Africans will participate in this necessary research. Here are his own words [Science. 2011 Feb 11; 331: 689.] on the sequencing of his genome [Nature. 2010 Feb 18; 463(7283): 943-7.].

Archbishop Tutu

Southern Africans are victims of many devastating diseases whose eradication requires immediate attention and international resources. My hope is that my genetic code may provide a voice for the region and serve as the starting point for a map of DNA variation significant for Southern African peoples, to be used for medical research efforts and effective design of medicines. I implore the scientific community to continue what I hope was just a first step to further medical research within the region.

— Archbishop Tutu

Principal Investigator

Vanessa Hayes

Analytical Team

Nicholas Schork
The SCRIPPS Research Institute, San Diego

Desiree Petersen
JCVI, San Diego

Ondrej Libiger
The SCRIPPS Research Institute, San Diego

Elizabeth Tindall
JCVI, San Diego

International PhD Students and Visiting Scientists

Rae-Anne Hardie
University of New South Wales, Australia

Zolani Simayi
University of Limpopo, South Africa

Katherine Theron
University of Limpopo, South Africa

Related Research

PAST PROJECT

Publications

PLoS genetics. 2013-03-14; 9.3: e1003309.
Complex patterns of genomic admixture within southern Africa
Petersen DC, Libiger O, Tindall EA, Hardie RA, Hannick LI, Glashoff RH, Mukerji M, Indian Genome Variation Consortium, Fernandez P, Haacke W, Schork NJ, Hayes VM
PMID: 23516368
Nature biotechnology. 2013-03-01; 31.3: 211-2.
Single-cell sequencing in its prime
Lasken RS
PMID: 23471069
Nature. 2011-08-10; 476.7359: 152.
Helping hand for genomics in Africa
Hayes VM, Venter PA, Mphahlele MJ
PMID: 21833071
Science (New York, N.Y.). 2011-05-06; 332.6030: 639.
Indigenous genomics
Hayes V
PMID: 21551033
Nature. 2010-02-18; 463.7283: 943-7.
Complete Khoisan and Bantu genomes from southern Africa
Schuster SC, Miller W, Ratan A, Tomsho LP, Giardine B, Kasson LR, Harris RS, Petersen DC, Zhao F, Qi J, Alkan C, Kidd JM, Sun Y, Drautz DI, Bouffard P, Muzny DM, Reid JG, Nazareth LV, Wang Q, Burhans R, Riemer C, Wittekindt NE, Moorjani P, Tindall EA, Danko CG, Teo WS, Buboltz AM, Zhang Z, Ma Q, Oosthuysen A, Steenkamp AW, Oostuisen H, Venter P, Gajewski J, Zhang Y, Pugh BF, Makova KD, Nekrutenko A, Mardis ER, Patterson N, Pringle TH, Chiaromonte F, Mullikin JC, Eichler EE, Hardison RC, Gibbs RA, Harkins TT, Hayes VM
PMID: 20164927
Human molecular genetics. 2010-02-01; 19.3: 411-9.
Genetic structure of a unique admixed population: implications for medical research
Patterson N, Petersen DC, van der Ross RE, Sudoyo H, Glashoff RH, Marzuki S, Reich D, Hayes VM
PMID: 19892779
PAST PROJECT

Population-Specific Data Resources

Ju/'hoan 

Location: Greater Kalahari region of Namibia
Self-identified broad grouping: Bushmen
Language grouping: Ju-speaking (previously Northern Khoesan)
Specific grouping: Ju/'hoan

Illumina Omni-quad 1M data: n=19 

!Xun

Location: northern Namibia (including migrants from southern Angola)
Self-identified broad grouping: Bushmen 
Language grouping: Ju-speaking (previously Northern Khoesan) 
Specific grouping: !Xun or Vasekela (latter also known as Angolan !Xun) 

Illumina Omni-quad 1M data: n=14

amaXhosa

Location: Eastern Cape region of South Africa
Self-identified broad grouping: Bantu (Southern Bantu) 
Broad Bantu language grouping: Nguni speakers 
Language: isiXhosa 

Illumina Omni-quad 1M data: n=15

Coloured

Location: South Africa
Geographically defined subpopulations: Northern Cape (NC), Western Cape (WC), District Six (D6) and Eastern Cape (EC) 

Language: Afrikaans / English
Illumina Omni-quad 1M data: n=25

Baster

Location: Rehoboth, Namibia 
Language group: Predominantly Afrikaans 

Illumina Omni-quad 1M data: n=30

Genotype Data

PLoSGenetics_Petersen_2013.bed

PLoSGenetics_Petersen_2013.bim

PLoSGenetics_Petersen_2013.fam

Principal Investigator

Vanessa Hayes

Analytical Team

Nicholas Schork
The SCRIPPS Research Institute, San Diego

Desiree Petersen
JCVI, San Diego

Ondrej Libiger
The SCRIPPS Research Institute, San Diego

Elizabeth Tindall
JCVI, San Diego

International PhD Students and Visiting Scientists

Rae-Anne Hardie
University of New South Wales, Australia

Zolani Simayi
University of Limpopo, South Africa

Katherine Theron
University of Limpopo, South Africa

Related Research

PAST PROJECT

Southern African Complete Personal Genomes

ABT — Archbishop Desmond Tutu

Identifier ABT
Archbishop Desmond Tutu
Gender Male
Country South Africa
Heritage Southern Bantu
amaXhosa (paternal)
Motswana (maternal)
Date of Birth 7 October 1931
Disease survival Poliomyelitis (Ploio)
Pulmonary tuberculosis (TB)
Prostate Cancer
Complete genome Schuster SC, Miller W, et al.
Complete Khoisan and Bantu genomes from southern Africa.
Nature. 2010 Feb 18;463(7283):943-7. doi: 10.1038/nature08795. Data: SOLiD3.0 (30x coverage) available on Galaxy located at Penn State University
MT-genome Haplogroup L0d2
Maternal Khoesan ancestral heritage
ABT_MTgenome_Sequence
Y-chromosome Haplogroup E1b1a8a
Paternal West-central African ancestral heritage
Genotypes Illumina 1M Duo Data
ABT_Illumina1MDuo_Genotypes

Ancestral fractions ABT

KB1 — !Gubi

Identifier KB1
!Gubi
Gender Male
Country Namibia
Heritage Tuu-speaker
Bushman
Date of Birth Unknown — Estimated > 80 years
Disease survival None reported
Complete genome Schuster SC, Miller W, et al.
Complete Khoisan and Bantu genomes from southern Africa.
Nature. 2010 Feb 18;463(7283):943-7. doi: 10.1038/nature08795. Data: Roche/454 GS FLX (10.2x coverage) available on Galaxy located at Penn State University
MT-genome Haplogroup L0d1b
KB1_MTgenome_Sequence
Y-chromosome Haplogroup B2
Genotypes Illumina 1M Duo Data
KB1_Illumina1MDuo_Genotypes
Divergence time estimation 157-108K ybp
Gonau et al., Nature Genetics, 2011

Ancestral fractions KB1

Principal Investigator

Vanessa Hayes

Analytical Team

Nicholas Schork
The SCRIPPS Research Institute, San Diego

Desiree Petersen
JCVI, San Diego

Ondrej Libiger
The SCRIPPS Research Institute, San Diego

Elizabeth Tindall
JCVI, San Diego

International PhD Students and Visiting Scientists

Rae-Anne Hardie
University of New South Wales, Australia

Zolani Simayi
University of Limpopo, South Africa

Katherine Theron
University of Limpopo, South Africa

Related Research

PAST PROJECT

Background

As defined by the United Nations geographical classification, southern Africa includes the countries of Botswana, Lesotho, Namibia, South Africa and Swaziland. The region is home to populations broadly defined as Khoesan, Bantu, or European. We introduce a new classification of populations that have emerged from significant migrations into the region.

Southern Africa, as defined by the United Nations geographical classification.

A Word on the Use of Population Identifiers

The publication of the 'Complete Khoisan and Bantu genomes from southern Africa' (Nature 463, 943 — 947; 2010), led to a published response by C. Schlebusch (Nature 464: 487; 2010) to our use of the terms Khoisan, Bushmen and Bantu, stating that these terms 'are perceived by those populations as outdated and even derogatory'.

It should be noted that classification of peoples, based either on ethnic/cultural similarities or linguistics has historically been derived by non-indigenous academics. Many of these terms, for example Bantu and Khoe (or Khoi), simply mean 'people'. With a rich diversity in culture and languages, the indigenous peoples of southern Africa did not have a need for a collective. One of the goals of our research is therefore to provide an identity to these individual groups.

Academic research does, however, dictate that a form of collective is used to describe a culture, language, region, or in the case of genomics, a human lineage. We are therefore forced to establish the best possible fit, while respecting the perceptions of these terms by the community. The research team has therefore made a concerted effort to allow the voice of the communities participating in each project to be heard via self-identification. This is the same procedure in place in the United States for research projects.

Population identifiers in South Africa. Until 1991 South African law divided the population into four official ethnic groups:'Black', 'Coloured', 'Indian' and 'White'. Although many South Africans still use these population identifiers, for others these classifications have negative connotations. In an attempt to address our use of the population identifier 'Coloured' in our studies, we performed a population classification survey of 521 'Coloured' blood donors residing across several suburbs within the Western Cape of South Africa. Unprompted population self-identification included; 91.2% identification as "Coloured," "South African Coloured," or "Cape Coloured," while 8.8% referred to themselves as "Admixed/Mixed." Based on our survey we conclude that Coloured is still the most widely recognized population-specific identifier within this community.

Description of Terms

Khoesan is an academic collective used to define indigenous southern Africans with a foraging mode of subsistence with click-using languages (excluding the Bantu-based languages that have incorporated clicks). The preferential use of the spelling Khoesan (over Khoisan) is based on the linguistic observation that the combination of o+i does not exist in the Khoekhoegowab language from where the word was taken. Anthropologically, archeologically, linguistically (and very likely genomically) the Khoesan definition includes two distinct peoples, the Khoe herder-gatherers and the San hunter-gatherers, that once inhabited a broader region at the most southern tip of Africa.

Bushmen (Bossiesman in Afrikaans), over the more academically accepted term 'San', is the community preferred collective in our initial studies, which includes only Namibian and relocated Angolan Bushmen. Populations in our study that fall under the Bushmen classification are the Ju/'hoan and !Xun (obsolete !Kung). The following symbols represent dental (/), alveolar (!), palatal ("¡), and the lateral (//) clicks.

Bantu is a broad term used to describe around 500 sub-Saharan African populations defined by their linguistic use of distinct Niger-Congo B languages. Meaning 'people', Bantu languages are distinguishable from the Khoesan languages as they do not use click consonants and are characterized by the extensive use of affixes. An exception to the non-clicking rule is found in the South African Nguni languages such as isiXhosa (see below) and isiZulu.

amaXhosa are a South African Bantu people from the Nguni linguistic classification consisting of several subgroups. The amaXhosa are likely one of the first migrant Bantu to reach South Africa along the east coast up to 1500 years ago and settling in the Eastern Cape Province of South Africa. Their encounters with indigenous Khoesan led to the incorporation of 'click' sounds in the isiXhosa non-Khoesan language. It should be noted that the Southern Bantu groups are denoted with a prefix, which will differ when referring to a language, namely isiXhosa, or to a people, amaXhosa. The English alternative is to omit the prefix and refer to the ethnolinguistic identifier as Xhosa.

Coloured of South Africa emerged as a direct result of colonization and slave trade beginning in 1652 at the most southern tip of Africa. These early migrants included European settlers (predominantly Dutch, German and French, and later British) and slaves from the East (including India, Indonesia, and Sri Lanka), Madagascar, and coastal Africa (including Mozambique, Angola and Guinea). The Coloured therefore represent a complex genomic ancestry including European, Asian, African and indigenous Khoesan.

Basters of Namibia were formerly part of the original pool that led to the rise of the Coloured at the Cape of South Africa, who subsequently traveled north towards then South West Africa (now Namibia) and settled in the town of Rehoboth. In 1872 they established themselves as an independent population with a national flag. The use of the term Baster (or Rehoboth Baster) in Namibia is regarded with immense pride within the community.

Principal Investigator

Vanessa Hayes

Analytical Team

Nicholas Schork
The SCRIPPS Research Institute, San Diego

Desiree Petersen
JCVI, San Diego

Ondrej Libiger
The SCRIPPS Research Institute, San Diego

Elizabeth Tindall
JCVI, San Diego

International PhD Students and Visiting Scientists

Rae-Anne Hardie
University of New South Wales, Australia

Zolani Simayi
University of Limpopo, South Africa

Katherine Theron
University of Limpopo, South Africa

Related Research