Selection bias in genome-wide association studies (GWASs) due to volunteer-based sampling (volunteer bias) is poorly understood. The UK Biobank (UKB), one of the largest and most widely used cohorts, is highly selected. We develop inverse probability weighted GWAS (WGWAS) to correct GWAS summary statistics in the UKB for volunteer bias. Across ten phenotypes, WGWAS decreases the effective sample size by 62% on average, compared to GWAS. WGWAS yields novel genome-wide significant associations, larger effect sizes and heritability estimates, and altered gene-set tissue expressions. The extent of volunteer bias’s impact on GWAS results varies by phenotype. Traits related to disease, health behaviors, and socioeconomic status were most affected. These findings suggest that volunteer bias in extant GWASs is substantial and call for a GWAS 2.0: a revisiting of GWAS, based on representative data sets, either through the development of inverse probability (IP) weights, or a greater focus on population-representative sampling.
Publication Type
Working Paper
File Description
First version, June 12, 2023
JEL Codes
C25: Single Equation Models; Single Variables: Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions
C83: Survey Methods; Sampling Methods
H51: National Government Expenditures and Health