====== UGLI ====== UGLI is one of [[cohort|Lifelines]]' [[additional assessments]]. UGLI is the abbreviation for UMCG Genetics Lifelines Initiative. UGLI aims at facilitating and accelerating genetic data generation and data analysis and thereby scientific output through using the Lifelines genomics data. ===== Background ===== Genome-wide association (GWAS) data is highly valuable for biobanks such as Lifelines in identifying disease/trait associations, predicting future disease development and personalized treatment.\\ To facilitate the generation, analysis and study of genetic data in Lifelines, the UGLI consortium was founded. UGLI brings together many groups and PIs within the UMCG, RUG and beyond that are interested in performing such research with Lifelines data. They have brought the funding together which led to the initial genotyping of a total of 38,030 + 29,366 Lifelines participants, including [[children]], as part of the [[http://glimdna.org|HUGE]] consortium in Rotterdam on the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0 and the FinnGen Thermo Fisher Axiom® custom array (resp.). Together with 15,400 samples already genotyped on the CytoSNP array ([[GWAS]]), three quality controlled GWAS datasets with a combined sample size of n~80,000 subjects are available. Genotyping of the remaining Lifelines participants is still ongoing.\\ \\ \\ ===== UGLI1 - GSA ===== 38,030 Lifelines participants were selected for UGLI1 using the following criteria: * availability of isolated DNA-samples of adequate volume and concentration at Lifelines * Caucasian-ancestry samples The genotype of 38,030 participants was assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0((https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/infinium-commercial-gsa-data-sheet-370-2016-016.pdf)). In the QC screening all genotyped samples were included, and the focuss of the QC of genetic markers was on the autosomes and chromosomes X (N=691,072 markers). A final set of 36,339 samples and 548,029 markers on autosomal and X chromosomes passed the QC steps described in {{ {{ ::qc_report_ugli1_release_2_-v1.pdf }}. ^ UGLI1 - GSA cohort - samples that passed QC || | Subgroup | N | | Total | 36,339 | | Male | 15,098 | | Female | 21,241 | | Age* [[children|8-17]] | 3,522* | | Age* 18-64 | 30,416 | | Age* >64 | 2,401 | Table 1: UGLI1 - GSA cohort information. These are samples that passed QC. Age at [[1a_visit_1|Baseline assessment first visit]]. *One participant did not visit during [[1a|Baseline]], but did visit during [[2a|2nd screening]]. Since participant was under 18 years of age at [[2a_visit_1|2nd screening visit 1]], this participant has been added to the children 8-17 group. {{:ugli_age_distribution.jpg?400|}} ==== Quality Checks ==== An UGLI1 - GSA (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the first release of UGLI comprising the genotype of 38,030 participants assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. {{ :qc_report_ugli1_release_2_-v1.pdf |}} ====Imputation==== A final set of 36,339 samples and 548,029 markers on autosomal and X chromosomes passing all QC steps described in {{ :qc_report_ugli1_release_2_-v1.pdf |}} were used for genetic imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium ( [[http://www.haplotype-reference-consortium.org]] ) panel. The dataset was formatted following the instructions from the Sanger webpage ( [[https://www.sanger.ac.uk/science/tools/sanger-imputation-service]] ).\\ ====SNP array intensity files==== Raw intensity data from the GSA will be made available to the researchers. \\ \\ ===== UGLI2 - Affymetrix ===== As of March 2023, data of an additional 28,149 genotyped participants has been made available. Samples in this release, called UGLI2, were genotyped using the FinnGen Thermo Fisher Axiom® custom array. 29,366 participants were selected for UGLI 2 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=617,715 and 22,346 markers available, respectively. A final set of 28,149 samples and 441,596 markers on autosomal and 18,450 X chromosomes markers passed the QC steps described in {{ :qc_report_ugli2_release_1_-v1.pdf |}}. ^ UGLI2 - Affymetrix cohort - samples that passed QC || | Subgroup | N | | Total | 28,149 | | Male | TBA | | Female | TBA | | Age* [[children|8-17]] | TBA | | Age* 18-64 | TBA | | Age* >64 | TBA | Table 3: UGLI2 - Affymetrix cohort information. These are samples that passed QC. Please note that the array used for UGLI2 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ThermoFischer=UGLI2) is small, namely 1000-10000 SNPs. ==== Quality Checks ==== An UGLI2 - Affymetrix (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the second release of UGLI comprising the genotype of 29,366 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. {{ :qc_report_ugli2_release_1_-v1.pdf |}} ====Imputation==== A final set of 28,149 samples and 460,136 markers on autosomal and X chromosomes passing all QC steps described above were used for genetic imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium (http://www.haplotype-reference-consortium.org) panel. ====SNP array intensity files==== Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers. \\ \\ ===== UGLI3 ===== TBA \\ \\ ===== Overlap between studies===== ^ Study name ^ N in UGLI1 ^ N in UGLI2 ^ | DEEP (DAG1) | ~500 | | DAG3 | ~9000 | | GoNL | 143 | | GWAS4 | 938 | Table 2: A number of participants in UGLI1 also participated in other studies, i.e. [[deep|DAG1]], [[dag3|DAG3]], [[dag2|DAG2]]/[[http://www.nlgenome.nl/|GoNL]] and [[gwas|GWAS4]]. In the second column the sample sizes that overlap between these studies and UGLI1 can be found. For DAG1 and DAG3 these are approximations. \\ \\ =====UGLI-data release===== UGLI data is available on the HPC (Linux environment) of the UMCG. The data will not be accessible through the Lifelines workspace. The applicant’s proposal will be reviewed by both Lifelines and the UGLI steering committee (UGLI SC). Requesting UGLI data: The applicant applies via the regular Lifelines application procedure. This means the applicant submits the proposal together with the dataset order using our online catalogue (https://data-catalogue.lifelines.nl/). UGLI data cannot be selected through the online catalogue. The applicant can request UGLI data by stating this in the application form (Appendix: Request for Source Data (Not in catalogue). ===== Abbreviations ===== | GWAS | Genome Wide Association Study | | UGLI | UMCG Genetics Lifelines Initiative | | UGLI SC | UGLI steering committee | | GSA | Global Screening Array | | SNP | Single-nucleotide polymorphism | | HW | Hardy-Weinberg Equilibrium | | WGS | Whole Genome Sequencing | | MAF | Minor allele frequency | | PCA | Principle Components Analysis | | HPC | High Performance Computing | | PLINK | PLINK is a command line program written in C/C++ |