SNP3: Biobanks [Nutrition Science Explained #1]

In Podcasts by Danny LennonLeave a Comment

This is a Premium-exclusive episode. In order to listen to the episode and access the study notes, you will need to subscribe to Sigma Nutrition Premium.


This is the first episode in a new series called “Nutrition Science Explained”, in which members of the Sigma team will take a concept commonly mentioned in discussions about nutrition science, and explain what it is, give more background context, and highlight important aspects to know. The goal is to aid listeners to have a deeper understanding of other episodes when such concepts are mentioned.

This first episode discusses the concept of biobanks. And specifically, how they are used in nutrition science. Dr. Niamh Aspell walks through an explanation, gives some examples of studies using biobank data, and highlights advantages and limitations to their use.


Niamh Aspell, PhD

  • PhD in nutrition & cognition from Trinity College Dublin
  • BSc in Human Nutrition from UUC
  • Postgraduate qualification in Applied Statistics

Niamh obtained her BSc in Human Nutrition (UUC) and has been involved in several national and regional research projects. She holds a postgraduate qualification in Applied Statistics, and has collaborated with various research groups on the social, health and economic determinants of healthy ageing. Her PhD was entitled, ‘Vitamin D in ageing: an investigation into the role of vitamin D in cognitive and physical functioning in community older adults’.

She has previously worked on projects exploring the role of artificial intelligence in healthcare, and also as a scientific manager for food clinical trials. She has delivered numerous educational talks to a wide range of audiences including athletes, clinically vulnerable adults and general health and wellbeing for corporate organisations.

Her areas of expertise include study design, population health, research ethics, nutrition and lifestyle interventions, ageing, specifically cognitive health, dementia care, social care and psychosocial wellbeing. She has previously delivered lectures on the Biological Basis of Behaviour and facilitated student learning in Anatomy and Physiology.

Detailed Study Notes

What are ‘biobanks’?

The term biobank covers collections of biospecimens, i.e. specimens from plants, animals, and (most relevant to nutrition research) humans.

A biobank is known as a “biorepository”; a place that processes, stores and distributes biospecimens (such as blood or saliva samples) and their associated data for use in research.

Initially, biobanks were operations set-up and managed by small university labs developed for research requirements of specific projects. For example, multi-cohort studies that aimed to collect detailed clinical, lifestyle, dietary, genetic and biochemical data to investigate gene-nutrient interactions in the development of conditions in the early stages of common diseases, or disease free participants.

They are typically prospective in nature, longitudinal investigation so as to re-assess clinical, nutritional, genetic and biochemical factors in relation to the progression of disease outcomes, in subsequent years after the initial baseline investigation.

Biobanks have evolved in the last decade, and there are now:

  • governmental-supported biobanks
  • for-profit/commercial biobanks
  • population based biobanks
  • virtual biobanks (to help investigators locate biospecimens, for testing and data mining from multiple biobanks in different locations)

They now serve a broad remit, they can be used in areas of epidemiology, population genetics and pharmacogenomics and provide laboratory support for case-control and longitudinal studies. Some aim to facilitate exploration of new disease markers and store clinical trial samples for future testing.

Image from: OpenSpecimen

In addition, the data or metadata linked with bio samples has changed in complexity too, from date of collection of the sample and diagnosis to extensive data files including aspects of patients phenotype – personal characteristics, including diet and lifestyle and physical measurements, genetic data, imaging data and proteomics (set of proteins produced by an organism), and other ‘omic’ information. Therefore, biobanks provide access to quality-defined biological samples and associated health-related information as well as data generated by analysis of biological samples. This has led to an increased demand for high quality specimens with accurate, reliable, standardised, clinical and lab data.

How are biobanks used in nutrition research?

The challenge of understanding the determinants of disease and life limiting illnesses is complex. Biological samples are needed to identify markers of the early biological effects of nutrition. They are crucial in understanding the causal pathways underlying the impact of nutrition on health and disease. In addition, these conditions are typically caused by a combination of factors, including nutrition and lifestyle factors, as well as environmental, and genomic factors, with individually modest effects or contribution to the evolution of disease in addition, these factors don’t act independently but exist in complex interactions. The detection and quantification of these dependencies and influences requires studies with large numbers of disease cases.

While traditional retrospective case-control studies of a particular disease, for example diabetes or existing prospective studies of particular risk factors can help to address this challenge, a complementary approach is to establish large prospective cohorts designed to study a much wider range of known and novel risk factors for a wide range of diseases. Prospective studies can assess exposures before the onset and treatment of disease, diseases that are not readily investigated by retrospective studies, and both the adverse and beneficial effects of a specific exposure on the lifetime risks of different diseases.

Biological specimens and associated health information complemented with dietary assessment of food and nutrient intake offer unique opportunities for the objective assessment of internal exposure, thereby allowing nutrition to be related to health outcomes.

Examples of biobanks and published nutrition research

UK Biobank

  • The UK Biobank is a large and extensive prospective study with over 500,000 participants aged 40–69 years recruited from 2006–2010.
  • The planning and set up took 10 years, the goal of the UK biobank is to improve the prevention, diagnosis and treatment of a wide range of illnesses, including cancer, cardiovascular disease, diabetes, osteo, depression and some dementias.
  • The goal is to aid in the area of population disease risk.
  • The study collected and continues to collect extensive phenotypic and genotypic detail about its participants, including:
    • data from questionnaires
    • physical measures
    • sample assays
    • accelerometery
    • multimodal imaging
    • genome-wide genotyping
    • longitudinal follow-up for a wide range of health-related outcomes.
  • The participants have also consented to linking their medical records so they can keep record of the diagnosis of different illness or diseases.
[For more on the UK Biobank specifically, see episode 349 of the podcast with Chief Scientist for UK Biobank, Professor Naomi Allen]

UK Biobank – example applications for nutrition research

Rauber et al., 2021 – Ultra-processed food consumption and risk of obesity: a prospective cohort study of UK Biobank

  • In a recent global analysis of the trends in sales of ultra-processed foods, the UK has been ranked the third highest consumer of ultra-processed foods, the researchers wanted to examine the associations between ultra-processed food consumption and risk of obesity among UK adults.
  • Using the UK biobank data (collected from 2006–2019), taking those with evaluation of dietary intakes (24-h recall) and repeated measures of adiposity (BMI), waist circumference and % BF (n 22,659; median follow-up: 5 years).
  • Ultra-processed foods were identified using the NOVA classification (‘industrial formulations of substances derived from foods, which typically contain cosmetic additives (i.e. flavours and colours) and little, if any, whole foods’) (estimated as % total energy intake).
  • Multivariable Cox proportional hazards regression models were used to estimate HR of several indicators of obesity according to ultra-processed food consumption. Models were adjusted for sociodemographic and lifestyle characteristics.
  • Covariates – Baseline study covariates included:
    • age
    • sex
    • quintiles of the Index of Multiple Deprivation (IMD)
    • level of physical activity (low/moderate/high)
    • current smoking status (smoker/non-smoker)
    • sleep duration (≤ 6 h/day, 7–8 h/day, ≥ 9 h/day)
    • BMI, waist circumference or body fat at baseline adjusted when appropriate.
  • Findings
    • 947 incident cases of overall obesity (BMI ≥ 30 kg/m2) and 1900 incident cases of abdominal obesity (men: WC ≥ 102 cm, women: WC ≥ 88 cm) were identified during follow-up.
    • Participants in the highest quartile of ultra-processed food consumption had significantly higher risk of developing overall obesity (HR 1.79; 95% CI 1.06─3.03) and abdominal obesity (HR 1.30; 95% CI 1.14─1.48).
    • They had higher risk of experiencing a ≥ 5% increase in BMI (HR 1.31; 95% CI 1.20─1.43), WC (HR 1.35; 95% CI 1.25─1.45) and %BF (HR 1.14; 95% CI 1.03─1.25), than those in the lowest quartile of consumption.
    • I.e. higher consumption of ultra-processed foods also increased the risk of a gain in BMI, WC and body fat of 5% or more during the follow-up period (median of 5.6 years)
  • Advantages – The authors were able to extrapolate and replicate their previous findings from population based studies in South America. Understanding the contribution of dietary products and incidence of UP foods allows policy makers to consider actions that promote reduced consumption at national level.
Considerations for use of the UK Biobank

Limitations of dietary assessment

  • The 24hr one day recall was introduced towards the end of recruitment period, so many participants’ dietary assessment was conducted via online dietary assessment – up to potentially 2 years after the physical measures.
  • The UK Biobank provides the number of portions for each item consumed per day but does not retain the nutritional information (grams and calories) assigned to each food and beverage item.
  • This is estimated based on a ‘typical’ portion size.

Obesity prevalence

  • 18% in this sample vs 29% in the British population.
  • They also had higher intakes of unprocessed or minimally processed foods (especially fruit, vegetables and fish) and lower intakes of ultra-processed foods than the British population.

Inclusion/exclusion criteria

  • For the above-mentioned Rauber study, included participants were those with a valid 24-h dietary recall collected (n = 211,009). They excluded participants for various reasons but most notably participants with missing anthropometric data at baseline or follow-up (n = 187,533). Dramatic decrease in study sample. So data was only from 22,659 participants in the study analyses.

Characteristics of the cohort need to be considered

  • The dietary recall in follow-up waves was conducted by an online assessment – requiring some level of digital literacy and digital access – potentially limiting the demographic profile of the cohort – which is typically mid-high socioeconomic status, with health motivations.

Understanding access to biobank data

Many cohort studies have mechanisms for sharing data with external researchers on a collaborative basis, but relatively few have arrangements for open access to the data without any need for collaboration, and even fewer have been established from the outset with the intention of making the entire resource available to the global research community.

UK Biobank aims to encourage and provide as wide access as possible to its data and samples for health-related research in the public interest by all bona fide researchers from the academic, charity, public, and commercial sectors, both in the UK and internationally, without preferential or exclusive access for any user.

It is important to note that data access is open, but not without a cost, applications from student researchers must be for the sole purpose of performing a postgraduate student project (e.g. MSc or PhD or equivalent), submitted by the student or their supervisor and cost a significantly reduced fee of £500 (GB Pound) with limited data access and features, full fees range from £3000-9000.

How open access data can cause confusion if misused

In an event like COVID-19, biobank data provides good sources of data to link with health records to help understand populations at risk. However, this needs to be carefully evaluated. Whilst the biobank should open access to all researchers, there should be consideration on the proposed investigations and whether the research conclusions are really ‘in the public interest’. Potentially exploiting open access population data – adding to a pool of confusion and misinterpretation of evidence in nutrition research.

Example: Vu et al., 2021 – Dietary Behaviors and Incident COVID-19 in the UK Biobank

  • The authors conclude: “In the UK Biobank, consumption of coffee, vegetables, and being breastfed as a baby were favourably associated with incident COVID-19; intake of processed meat was adversely associated. Although these findings warrant independent confirmation, adherence to certain dietary behaviors may be an additional tool to existing COVID-19 protection guidelines to limit the spread of this virus.” (emphasis added by Sigma Nutrition).
  • Such a conclusion, based on this type of study can be incredibly misleading and confusing.
  • Only a portion of overall UK Biobank participants were tested for COVID-19 (about 10%) during the study timeframe, and these participants were slightly older, less educated, and less employed, while reporting poorer health than the original UKB cohort.
  • Those factors were associated with higher odds of COVID-19 infection in our analysis sample.
  • The full UK Biobank is not representative of the sampling population, with evidence of a ‘healthy volunteer’ selection bias. Moreover, we had no concurrent pandemic data on other established risk factors for COVID-19 infection, such as participant social distancing behaviour, work environment, and face mask-use; some of these factors may correlate with diet behaviours.
  • To use the biobank data to make claims about coffee and vegetables being of protective benefit is very misleading and is not based on the level of evidence needed to make such a claim.

Other Notable Biobanks



  • The large range of biomarkers, phenotypic information, large samples sizes with documented clinical outcomes.
  • Provides opportunity to do sub group analysis, marginalised groups or rare conditions.
  • Enables scientists across the globe to answer a number of research questions, it also enables scientists to create harmonised data sets to test and demonstrate similar hypothesis to various populations, to tease out the behavioural and environmental attributes and their contribution to certain diseases.


  • Sometimes genotypic data is oversold
    • Although many common diseases have some genetic component, only a few have single genes that seem to contribute significantly to disease risk, and even with this only a small percent of those with the disease.
  • Ethical considerations
    • Biobanks result from the goodwill of research participants. There is a responsibility to inform research participants – currently research participants of the majority of biobanks don’t receive their data back – if they do have a gene variant that is of potential interest, or of definitive concern, they are not followed up at a later stage with this information.
    • Whilst the research participant is aware of consenting that the study participation is voluntary and that they will not benefit immediately, it is not always true that the research participant understands that if important information regarding their health is determined at a later stage, they will not be informed.
  • Generally represent “healthier groups” in society. What good is this information to a healthy participant, that they may be more susceptible, there is no intervention available, but it may cause major and damaging concern for the participant.

You are currently not signed-in as a Premium subscriber. Detailed study notes are for Premium subscribers only.

To view our Premium content, please to your account or subscribe to Premium.

Leave a Comment