Table of Contents

UGLI

UGLI is one of Lifelines' additional assessments. UGLI is the abbreviation for UMCG Genetics Lifelines Initiative. UGLI aims at facilitating and accelerating genetic data generation and data analysis and thereby scientific output through using the Lifelines genomics data.

Background

Genome-wide association (GWAS) data is highly valuable for biobanks such as Lifelines in identifying disease/trait associations, predicting future disease development and personalized treatment.
To facilitate the generation, analysis and study of genetic data in Lifelines, the UGLI consortium was founded. UGLI brings together many groups and PIs within the UMCG, RUG and beyond that are interested in performing such research with Lifelines data. They have brought the funding together which led to the initial genotyping of a total of 38,030 + 29,366 Lifelines participants, including children, as part of the HUGE consortium in Rotterdam on the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0 and the FinnGen Thermo Fisher Axiom® custom array (resp.). Together with 15,400 samples already genotyped on the CytoSNP array (GWAS), three quality controlled GWAS datasets with a combined sample size of n~80,000 subjects are available. Genotyping of the remaining Lifelines participants is still ongoing.


UGLI1 - GSA

38,030 Lifelines participants were selected for UGLI1 using the following criteria:

The genotype of 38,030 participants was assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.01). In the QC screening all genotyped samples were included, and the focuss of the QC of genetic markers was on the autosomes and chromosomes X (N=691,072 markers). A final set of 36,339 samples and 548,029 markers on autosomal and X chromosomes passed the QC steps described in qc_report_ugli1_release_2_-v1.pdf.

UGLI1 - GSA cohort - samples that passed QC
Subgroup N
Total 36,339
Male 15,098
Female 21,241
Age* 8-17 3,522*
Age* 18-64 30,416
Age* >64 2,401

Table 1: UGLI1 - GSA cohort information. These are samples that passed QC. Age at Baseline assessment first visit. *One participant did not visit during Baseline, but did visit during 2nd screening. Since participant was under 18 years of age at 2nd screening visit 1, this participant has been added to the children 8-17 group.

Quality Checks

An UGLI1 - GSA (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the first release of UGLI comprising the genotype of 38,030 participants assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. qc_report_ugli1_release_2_-v1.pdf

Imputation

A final set of 36,339 samples and 548,029 markers on autosomal and X chromosomes passing all QC steps described in qc_report_ugli1_release_2_-v1.pdf were used for genetic imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium ( http://www.haplotype-reference-consortium.org ) panel. The dataset was formatted following the instructions from the Sanger webpage ( https://www.sanger.ac.uk/science/tools/sanger-imputation-service ).

SNP array intensity files

Raw intensity data from the GSA will be made available to the researchers.



UGLI2 - Affymetrix

As of March 2023, data of an additional 28,149 genotyped participants has been made available. Samples in this release, called UGLI2, were genotyped using the FinnGen Thermo Fisher Axiom® custom array.

29,366 participants were selected for UGLI 2 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=617,715 and 22,346 markers available, respectively. A final set of 28,149 samples and 441,596 markers on autosomal and 18,450 X chromosomes markers passed the QC steps described in qc_report_ugli2_release_1_-v1.pdf.

UGLI2 - Affymetrix cohort - samples that passed QC
Subgroup N
Total 28,149
Male TBA
Female TBA
Age* 8-17 TBA
Age* 18-64 TBA
Age* >64 TBA

Table 3: UGLI2 - Affymetrix cohort information. These are samples that passed QC.

Please note that the array used for UGLI2 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ThermoFischer=UGLI2) is small, namely 1000-10000 SNPs.

Quality Checks

An UGLI2 - Affymetrix (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the second release of UGLI comprising the genotype of 29,366 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. qc_report_ugli2_release_1_-v1.pdf

Imputation

A final set of 28,149 samples and 460,136 markers on autosomal and X chromosomes passing all QC steps described above were used for genetic imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium (http://www.haplotype-reference-consortium.org) panel.

SNP array intensity files

Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers.

UGLI3

TBA

Overlap between studies

Study name N in UGLI1 N in UGLI2
DEEP (DAG1) ~500
DAG3 ~9000
GoNL 143
GWAS4 938

Table 2: A number of participants in UGLI1 also participated in other studies, i.e. DAG1, DAG3, DAG2/GoNL and GWAS4. In the second column the sample sizes that overlap between these studies and UGLI1 can be found. For DAG1 and DAG3 these are approximations.

UGLI-data release

UGLI data is available on the HPC (Linux environment) of the UMCG. The data will not be accessible through the Lifelines workspace. The applicant’s proposal will be reviewed by both Lifelines and the UGLI steering committee (UGLI SC).

Requesting UGLI data: The applicant applies via the regular Lifelines application procedure. This means the applicant submits the proposal together with the dataset order using our online catalogue (https://data-catalogue.lifelines.nl/). UGLI data cannot be selected through the online catalogue. The applicant can request UGLI data by stating this in the application form (Appendix: Request for Source Data (Not in catalogue).

Abbreviations

GWAS Genome Wide Association Study
UGLI UMCG Genetics Lifelines Initiative
UGLI SC UGLI steering committee
GSA Global Screening Array
SNP Single-nucleotide polymorphism
HW Hardy-Weinberg Equilibrium
WGS Whole Genome Sequencing
MAF Minor allele frequency
PCA Principle Components Analysis
HPC High Performance Computing
PLINK PLINK is a command line program written in C/C++