gwas
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
gwas [2020/06/22 15:08] – trynke | gwas [2025/02/05 13:49] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
GWAS is one of Lifelines' | GWAS is one of Lifelines' | ||
- | The name GWAS might be confusing, as the assessment is not a [[https:// | + | The name GWAS might be confusing, as the assessment is not a [[https:// |
Note that a second set of participants is genotyped in the [[UGLI]] project. | Note that a second set of participants is genotyped in the [[UGLI]] project. | ||
Line 18: | Line 18: | ||
| Age category 18-64 | N=13890 | | Age category 18-64 | N=13890 | ||
| Age category 65+ | N=1510 | | Age category 65+ | N=1510 | ||
+ | |||
+ | {{: | ||
===== SNP array ===== | ===== SNP array ===== | ||
Line 23: | Line 25: | ||
The 12-sample HumanCytoSNP-12 BeadChip is a powerful, whole-genome scanning panel designed for efficient, high-throughput analysis of genetic and structural variations that are the most relevant to human disease. Many types and sizes of structural variation in the human genome that affect phenotypes can be detected with the HumanCytoSNP-12 BeadChip, including duplications, | The 12-sample HumanCytoSNP-12 BeadChip is a powerful, whole-genome scanning panel designed for efficient, high-throughput analysis of genetic and structural variations that are the most relevant to human disease. Many types and sizes of structural variation in the human genome that affect phenotypes can be detected with the HumanCytoSNP-12 BeadChip, including duplications, | ||
- | ====Quality checks==== | + | ===== Quality checks |
Quality controls of the data are based on SNP filtering on minor allele frequency (MAF) above 0.001, Hardy-Weinberg equilibrium (HWE) P-value >1e-4, call rate of 0.95 using Plink ((Purcell S Neale B Todd-Brown Ket al. . PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; | Quality controls of the data are based on SNP filtering on minor allele frequency (MAF) above 0.001, Hardy-Weinberg equilibrium (HWE) P-value >1e-4, call rate of 0.95 using Plink ((Purcell S Neale B Todd-Brown Ket al. . PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; | ||
Line 53: | Line 55: | ||
| | | | ||
+ | ===== Imputation ===== | ||
+ | |||
+ | To get more information about the genome of the participants and therefore to scale up the number of SNPs, the genotype data generated using the arrays were imputed against reference genomes obtained by means of whole genome sequencing. IMPUTE2 (ref) is a program that can predict what the missing SNPs will be based on known SNPs or haplotypes (a combination of SNPs) by mapping the known genotypes against reference genomes. | ||
+ | |||
+ | Before imputation, the genotypes were pre-phased using SHAPEIT2 (ref) and aligned to the reference panels using Genotype Harmonizer (www.molgenis.org/ | ||
+ | Imputation analysis is performed through Beagle 3.1.0 and data in these formats will be made available. | ||
+ | |||
+ | The samples were imputed using Minimac((Howie B Fuchsberger C Stephens M Marchini J Abecasis GR. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat Genet 2012; | ||
+ | * the Genome of The Netherlands (GoNL) release 5((Genome of The Netherlands Consortium. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 2014; | ||
+ | * the 1000 Genomes phase1 v3((1000 Genomes Project Consortium, Abecasis GR, Altshuler D, et al. A map of human genome variation from population-scale sequencing. Nature | ||
+ | |||
+ | 1000G is a human reference panel which contains world-wide collected genomes. Since only a small population of Dutch people (from the North) are part of the 1000G panel, the genotyped SNP dataset from the array was also imputed against GoNL. This is a reference panel that contains genomes from individuals with Dutch ancestry (with parents born in the Netherlands). 165 genomes in this panel were collected from Lifelines participants, | ||
+ | et al. Scaling bio-analyses from computational clusters to grids. Proceedings of Fifth International Workshop on Science Gateways (IWSG), 2013, 3–5 June. Zurich, Switzerland , 2013)) imputation pipeline was used to generate and monitor our job scripts on the distributed file system. | ||
+ | |||
+ | In summary:\\ | ||
+ | |||
+ | * 1000G imputed: missing genotypes of participants are imputed (or predicted) based on 1000G reference panel (taken from worldwide population) and the available generated genotypes of the SNP array\\ | ||
+ | * GoNL imputed: missing genotypes of participants are imputed (or predicted) based on the GoNL reference panel (taken from the Dutch population) and the available generated genotypes of the SNP array\\ | ||
+ | * Unimputed: Genotypes determined on the basis of the SNP array, not imputed against genomic reference panels. | ||
+ | |||
+ | ===== Non-Caucasian samples ===== | ||
+ | To prevent false-positive association, | ||
+ | |||
+ | Samples are determined by: | ||
+ | * The LifeLines phenotype database (self-report)\\ | ||
+ | * Outlier (IBS) analysis\\ | ||
+ | * Population stratification (using Eigenstrat) | ||
+ | |||
+ | ===== Cryptic relationships ===== | ||
+ | Samples are selected using self reported family relations. After cleaning of the data, samples are compared with each other to determine the relationship by genetic similarity. If a pair of samples are indicated as first degree relatives, the sample with the best genotyping quality will be included. | ||
+ | |||
+ | |||
+ | ===== Releasing SNP genotype data ===== | ||
+ | The following files are available through the Lifelines workspace or HPC: | ||
+ | * files with phenotype data | ||
+ | * files with genotyped and imputed data | ||
+ | * quality control files: | ||
+ | * list of samples excluded | ||
+ | * list of SNPs excluded | ||
+ | * PCA component file | ||
+ | |||
+ | ===== Abbreviations ===== | ||
+ | | GWAS | Genome Wide Association Studies | ||
+ | | SNP | Single-nucleotide polymorphism | ||
+ | | HWE | Hardy-Weinberg Equilibrium | ||
+ | | CNV | Copy Number Variant | ||
+ | | MAF | Minor allele frequency | ||
+ | | PLINK | PLINK is a command line program written in C/C++ | | ||
gwas.1592838516.txt.gz · Last modified: (external edit)