| Home > Publications Database > A36: Utilizing the Jaccard Index to Reveal PopulationStratification in Sequencing Data: A SimulationStudy and an Application to the 1000 GenomesProject |
| Abstract/Journal Article | DZNE-2020-00974 |
; ; ; ; ; ;
2015
Abstract: Population stratification is one of the major sources of confounding in genetic association studies, potentially causing falsepositive and false-negative results. The effectiveness of existingadjustment approaches which are mostly built on the estimationof the genetic variance/covariance matrix is unclear for rare variants, since those variants are genetically much ‘younger’ andmight represent a different pattern of population structure.Here, we present a novel approach for the identification of population substructure in high density-genotyping data/next generation sequencing data. The approach exploits the co-appearances of rare genetic variants in individuals. The method can beapplied to all available genetic loci, does not require linkage disequilibrium (LD) pruning, and is computationally fast. Usingsequencing data from the 1000 Genomes Project, the features of the approach are illustrated and compared to existing methodology (i.e. EIGENSTRAT). We find that our approach works particularly well for genetic loci with very small minor allele frequencies. The results suggest that the inclusion of rare-variantdata/sequencing data in our approach provides a much higherresolution-picture of population-substructure than it can be obtained with existing methodology. Furthermore, we performedextensive simulation studies based on the minor allele frequencies of the European populations. We find scenarios where ourmethod was able to control the type 1 error more precisely andshowed higher power.
|
The record appears in these collections: |