We investigate to what extent volunteer-based sampling of large-scale biobanks biases associations and estimate inverse probability (IP) weights to correct for such bias. Using the UK Biobank (UKB) as an example of a large-scale volunteer-based cohort, and population-representative data from the UK Census as a reference, we compare 21 bivariate associations in both data sets. Volunteer bias in all associations as naively estimated in the UKB is substantial, and in some cases leads to estimates of the wrong sign. For example, older individuals in the UKB report being in better health. Correcting for volunteer bias using IP weights is therefore advised. Applying IP weights reduces 87% of volunteer bias on average and suggests volunteer-based sampling reduces the effective sample size of the UKB to ∼32% of its original size. To aid the construction of the next generation of biobanks, we provide suggestions on how to best ensure representativeness in a volunteer-based design.
Publication Type
Working Paper
File Description
First version, June 12, 2023
JEL Codes
C25: Single Equation Models; Single Variables: Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions
I18: Health: Government Policy; Regulation; Public Health
C83: Survey Methods; Sampling Methods