
How will genetic mapping of Indians help? | Explained Premium
The Hindu
GenomeIndia project publishes preliminary findings from sequencing 10,000 Indian genomes, highlighting unique genetic variations in diverse populations.
The story so far: The preliminary findings of the GenomeIndia project, which attempted to study whole genomes of 10,000 healthy and unrelated Indians from 83 population groups, were published in the journal Nature Genetics on April 8. After excluding two populations, the published findings are based on the genetic information of 9,772 individuals — 4,696 male participants and 5,076 female participants.
The 10,000-human genome study was launched in January 2020 with funding from the Department of Biotechnology. Blood samples and associated phenotype data such as weight, height, hip circumference, waist circumference and blood pressure were collected from 20,000 individuals representing 83 population groups — 30 tribal and 53 non-tribal populations — spread across India. Of the 20,000 individuals, DNA samples from 10,074 individuals were subjected to whole genome sequencing, but later two populations were excluded.
Also read: What is ‘IndiGen’ project that is sequencing Indian genes?
The GenomeIndia project is a collaborative effort of 20 institutions. The genome sequencing was carried out by the Centre for Brain Research at IISc Bengaluru, the Centre for Cellular and Molecular Biology in Hyderabad, Institute of Genomics & Integrative Biology in Delhi, National Institute of Biomedical Genomics in Kolkata, and Gujarat Biotechnology Research Centre in Gandhinagar.
A median of 159 samples from each non-tribal group and 75 samples from each tribal group chosen were collected from 83 population groups that inhabit over 100 distinct geographical locations to estimate the relatively rare mutations that are important to understand complex diseases. The samples were taken from unrelated individuals to ensure accurate estimation of mutation frequencies across groups. Three to six parent-child pairs were included in each population group to uncover de novo mutations (mutations that occur randomly in a child but not seen in parents).
Genomes of five tribes across India — Tibeto-Burman tribe, Indo-European tribe, Dravidian tribe, Austro-Asiatic tribe, and a continentally admixed outgroup — were sequenced. Genomes of three non-tribes — Tibeto-Burman non-tribe, Indo-European non-tribe, and Dravidian non-tribe — were also sequenced. Since language is an established proxy for genetic diversity in the Indian population, sampling was done to appropriately represent the four large major language families as well — Indo-European, Dravidian, Austro-Asiatic and Tibeto-Burman. However, the four ancient populations living in the Andamans, dating back 65,000 years ago, and two relatively modern populations from about 5,500 years ago, were not included.
In total, 180 million mutations have been found from the individuals sequenced; while 130 million variations are in the non-sex chromosomes (22 pairs of autosomes), 50 million mutations are in the sex chromosomes X and Y. It should not be surprising that 180 million mutations were found. The reason: the human genome has three billion base pairs of DNA and the genome of 9,772 individuals were sequenced. Most importantly, the 9,772 individuals belong to 83 distinctly different endogamous groups. Of that, the non-coding regions in the genome, which have DNA sequences that do not directly code for proteins, comprise 98%. A large number of the 180 million variants found in the sequenced genomes of 9,772 individuals are very likely to be present in the non-coding regions.