ugli
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ugli [2020/12/07 09:44] – [Quality Checks] sylvia | ugli [2025/08/13 13:56] (current) – petra_vinke | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== UGLI release 1====== | + | ====== UGLI ====== |
| UGLI is one of [[cohort|Lifelines]]' | UGLI is one of [[cohort|Lifelines]]' | ||
| Line 5: | Line 5: | ||
| ===== Background ===== | ===== Background ===== | ||
| Genome-wide association (GWAS) data is highly valuable for biobanks such as Lifelines in identifying disease/ | Genome-wide association (GWAS) data is highly valuable for biobanks such as Lifelines in identifying disease/ | ||
| - | To facilitate the generation, analysis and study of genetic data in Lifelines, the UGLI consortium was founded. UGLI brings together many groups and PIs within the UMCG, RUG and beyond that are interested in performing such research with Lifelines data. They have brought the funding together which led to the initial genotyping of a total of 38,030 Lifelines participants, | + | To facilitate the generation, analysis and study of genetic data in Lifelines, the UGLI consortium was founded. UGLI brings together many groups and PIs within the UMCG, RUG and beyond that are interested in performing such research with Lifelines data. They have brought the funding together which led to the initial genotyping of a total of 38, |
| - | The UGLI consortium is actively raising funding for the genotyping of additional samples. The genotyped additional samples are generally referred to as UGLI2. With additional funding of new UGLI members, the consortium will increase the number of genotyped Lifelines participants. These efforts will make Lifelines a more interesting partner for national and international collaborations as well as with non-academic partners that work on healthy ageing. | + | \\ |
| + | \\ | ||
| - | ===== Subcohort | + | ===== UGLI1 - GSA ===== |
| - | 38,030 Lifelines participants were selected for UGLI release 1 using the following criteria: | + | 38,030 Lifelines participants were selected for UGLI1 using the following criteria: |
| * availability of isolated DNA-samples of adequate volume and concentration at Lifelines | * availability of isolated DNA-samples of adequate volume and concentration at Lifelines | ||
| * Caucasian-ancestry samples | * Caucasian-ancestry samples | ||
| - | The genotype of 38,030 participants was assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. In the QC screening all genotyped samples were included, and the focuss of the QC of genetic | + | The genotype of 38,030 participants was assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0((https:// |
| markers was on the autosomes and chromosomes X (N=691,072 markers). | markers was on the autosomes and chromosomes X (N=691,072 markers). | ||
| - | A final set of 36,339 samples and 571,420 markers on autosomal and X chromosomes passed the QC steps described in QC_report_UGLI_R1.pdf. | + | A final set of 36,339 samples and 548,029 markers on autosomal and X chromosomes passed the QC steps described in {{ {{ :: |
| - | ^ UGLI release 1 cohort - samples that passed QC || | + | ^ UGLI1 - GSA cohort - samples that passed QC || |
| | Subgroup | | Subgroup | ||
| | Total | 36, | | Total | 36, | ||
| Line 26: | Line 26: | ||
| | Age* 18-64 | 30, | | Age* 18-64 | 30, | ||
| | Age* > | | Age* > | ||
| - | Table 1: UGLI release 1 cohort information. These are samples that passed QC. Age at [[1a_visit_1|Baseline assessment first visit]]. *One participant did not visit during [[1a|Baseline]], | + | Table 1: UGLI1 - GSA cohort information. These are samples that passed QC. Age at [[1a_visit_1|Baseline assessment first visit]]. *One participant did not visit during [[1a|Baseline]], |
| - | ==== Overlap between studies==== | + | {{:ugli_age_distribution.jpg?400|}} |
| - | ^ Study name ^ N in UGLI1 ^ | + | |
| - | | DAG1 | ~500 | | + | |
| - | | DAG3 | ~9000 | | + | |
| - | | GoNL | 143* | | + | |
| - | | GWAS4 | 938* | | + | |
| - | Table 2: A number of participants in UGLI1 also participated in other studies, i.e. [[deep|DAG1]], [[dag3|DAG3]], | + | |
| - | + | ||
| - | ===== SNP Genotyping Array ===== | + | |
| - | The Infinium Global Screening Array® (GSA) MultiEthnic Disease Version was used for SNP genotyping of the UGLI release 1 cohort. This array contains approximately 1,000,000 SNPs and combines multi-ethnic genome-wide content, curated clinical research variants, and quality control (QC) markers for precision medicine research((https:// | + | ==== Quality Checks ==== |
| - | + | An UGLI1 - GSA (release | |
| - | ===== Quality Checks | + | assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. {{ :qc_report_ugli1_release_2_-v1.pdf |}} |
| - | An UGLI (release | + | |
| - | assessed using the Infinium Global Screening Array® (GSA) MultiEthnic Disease Version 1.0. In this QC screening all genotyped samples were included, but the focus was on QC of genetic markers on the autosomes and chromosomes X (N=691,072 markers).\\ | + | |
| - | \\ | + | |
| - | In brief, first translations and corrections specific from the GSA platform to a general | + | |
| - | context of usage were made; namely, strand harmonization and removal of duplicate markers within the | + | |
| - | array. Secondly, low quality samples and markers were carefully filtered with a two-steps | + | |
| - | procedure of call rate thresholding. Further possible genotyping errors were assessed at the | + | |
| - | marker level by detecting variants that deviated significantly from Hardy-Weinberg equilibrium | + | |
| - | (HW) and at the sample level by evaluating heterozygosity. Then evaluated samples mix-ups | + | |
| - | were evaluated in two levels: i) concordance of reported sex with sex derived from genotyping data from the X | + | |
| - | and Y chromosomes, | + | |
| - | thus of the expected genome sharing between relatives with the observed sharing from | + | |
| - | genotyped data (genetic kinship). Moreso, to further evaluate sample mix-ups the | + | |
| - | concordance of genotype calling among a subset of samples with genotype information from a | + | |
| - | different array were compared ([[gwas|GWAS4]]: | + | |
| - | Subsequently, | + | |
| - | HW in unrelated individuals was ascertained. Finally, population stratification was inspected by a principle | + | |
| - | components analysis (PCA), incorporating samples from 1000 Genomes (1000G) and GoNL | + | |
| - | projects. These summarized steps are shown in Figure 1 in {{ :qc_report_ugli_r1.pdf |QC_report_UGLI_R1.pdf}}, where each step is annotated together with the required input and whether the step generates a graphical output or a report. The code and detailed description of the process can be found in: [[https:// | + | |
| - | \\ | + | |
| - | For a more detailed description of the QC steps: {{ : | + | |
| - | + | ||
| - | + | ||
| - | + | ||
| - | + | ||
| - | + | ||
| - | ===== Releasing SNP Genotyping files ===== | + | |
| - | + | ||
| - | ====QCed genotype calls==== | + | |
| - | A final set of 36,339 samples and 571,420 markers on autosomal and X chromosomes passing all QC steps described in {{ : | + | |
| ====Imputation==== | ====Imputation==== | ||
| - | A final set of 36,339 samples and 571,420 markers on autosomal and X | + | A final set of 36,339 samples and 548,029 markers on autosomal and X |
| - | chromosomes passing all QC steps described in {{ :qc_report_ugli_r1.pdf |QC_report_UGLI_R1.pdf}} were used for genetic imputation. Genetic imputation was done through the Sanger imputation service | + | chromosomes passing all QC steps described in {{ :qc_report_ugli1_release_2_-v1.pdf |}} were used for genetic imputation. Genetic imputation was done through the Sanger imputation service |
| using the Haplotype Reference Consortium | using the Haplotype Reference Consortium | ||
| ( [[http:// | ( [[http:// | ||
| following the instructions from the Sanger webpage | following the instructions from the Sanger webpage | ||
| ( [[https:// | ( [[https:// | ||
| - | |||
| - | More details on imputation can found in {{ : | ||
| ====SNP array intensity files==== | ====SNP array intensity files==== | ||
| Raw intensity data from the GSA will be made available to the researchers. | Raw intensity data from the GSA will be made available to the researchers. | ||
| - | ====Sex mismatches==== | + | \\ |
| - | In UGLI, samples for which biological sex does not match registrated sex will be excluded from the main dataset, but not completely neglected (as opposed to [[gwas|GWAS]]). These samples will still undergo quality check analyses and will be made available for analyses to the researchers. | + | \\ |
| - | + | ||
| - | + | ||
| - | + | ||
| + | ===== UGLI2 - Affymetrix ===== | ||
| + | As of March 2023, data of an additional 28,149 genotyped participants has been made available. Samples in this release, called UGLI2, were genotyped using the FinnGen Thermo Fisher Axiom® custom array. | ||
| + | 29,366 participants were selected for UGLI 2 release and assessed using the pre mentioned array. All genotypes were included for QC screening, but the QC focussed on the the autosomes and chromosomes X for which there are N=617,715 and 22,346 markers available, respectively. A final set of 28,149 samples and 441,596 markers on autosomal and 18,450 X chromosomes markers passed the QC steps described in | ||
| + | {{ : | ||
| + | ^ UGLI2 - Affymetrix cohort - samples that passed QC || | ||
| + | | Subgroup | ||
| + | | Total | 28, | ||
| + | | Male | TBA | | ||
| + | | Female | ||
| + | | Age* [[children|8-17]] | ||
| + | | Age* 18-64 | TBA | | ||
| + | | Age* > | ||
| + | Table 3: UGLI2 - Affymetrix cohort information. These are samples that passed QC. | ||
| + | Please note that the array used for UGLI2 differs from the one used in UGLI1. Overlap in SNPs between these two arrays (GSA chip from Illumina=UGLI1 and FinnGen array from Affymetrix/ | ||
| + | ==== Quality Checks ==== | ||
| + | An UGLI2 - Affymetrix (release 2.0) Quality Control Report is available, describing in detail the QC steps that were taken during the quality control (QC) process of the second release of UGLI comprising the genotype of 29,366 participants assessed using the FinnGen Thermo Fisher Axiom® custom array. {{ : | ||
| + | ====Imputation==== | ||
| + | A final set of 28,149 samples and 460,136 markers on autosomal and X chromosomes passing all QC steps described above were used for genetic imputation. Genetic imputation was done through the Sanger imputation service using the Haplotype Reference Consortium (http:// | ||
| + | ====SNP array intensity files==== | ||
| + | Raw intensity data from the FinnGen Thermo Fisher Axiom® custom array will be made available to the researchers. | ||
| + | \\ | ||
| + | \\ | ||
| + | ===== UGLI3 ===== | ||
| + | TBA | ||
| + | \\ | ||
| + | \\ | ||
| + | ===== Overlap between studies===== | ||
| + | ^ Study name ^ N in UGLI1 ^ N in UGLI2 ^ | ||
| + | | DEEP (DAG1) | ||
| + | | DAG3 | ~9000 | | ||
| + | | GoNL | 143 | | ||
| + | | GWAS4 | 938 | | ||
| + | Table 2: A number of participants in UGLI1 also participated in other studies, i.e. [[deep|DAG1]], | ||
| + | \\ | ||
| + | \\ | ||
| =====UGLI-data release===== | =====UGLI-data release===== | ||
| - | UGLI data will be made available on the HPC (Linux environment) of the UMCG and therefore the data is only accessible to researchers who are working at or are affiliated with the University Medical Center Groningen (UMCG). The data will not be accessible through the Lifelines workspace. The applicant’s proposal will be reviewed by both Lifelines and the UGLI steering committee (UGLI SC). | + | UGLI data is available on the HPC (Linux environment) of the UMCG. The data will not be accessible through the Lifelines workspace. The applicant’s proposal will be reviewed by both Lifelines and the UGLI steering committee (UGLI SC). |
| - | UGLI consortium members receive temporary exclusive right to use the data for a period of 3 years, meaning that consortium members hold the first right to use the data should the non-UGLI member | + | Requesting |
| - | + | The applicant | |
| - | A non-UMCG researcher requesting UGLI data has three options: | + | |
| - | * The applicant joins the UGLI consortium by paying €10.000. After joining the consortium you will receive an UMCG account so that you can access the UGLI data. This option holds the advantages that you will also gain the right of the 3 year embargo period, you will be able to submit multiple proposals | + | |
| - | * The applicant establishes a partnership with an UGLI consortium member. The applicant and the UGLI consortium partner will discuss the terms of this partnership, | + | |
| - | * The applicant | + | |
| Line 126: | Line 113: | ||
| | PLINK | PLINK is a command line program written in C/C++ | | | PLINK | PLINK is a command line program written in C/C++ | | ||
| + | |||
| + | ===== Publications with UGLI data ===== | ||
| + | |||
| + | * Li et al. 2024 [[https:// | ||
| + | * Keaton et al. 2024 [[https:// | ||
| + | * Qiao et al. 2023 [[https:// | ||
| + | * Warmerdam et al. 2022 [[https:// | ||
| + | * Nolte et al. 2017 [[https:// | ||
ugli.1607334289.txt.gz · Last modified: (external edit)
