DEEP/DAG1

DEEP/DAG1

Lifelines-DEEP, also known as DAG1, is one of Lifelines' additional assessments performed in collaboration with the UMCG department of genetics (see also: DAG2, DAG3, DAG4 and DAG5). DAG is the abbreviation of DArmGezondheid, or “Gastrointestinal health” in Dutch, a research project in which the microbiome is analysed in faecal samples.
The primary goal of the Lifelines-DEEP project is to get insight in the relations between genome, epigenome, transcriptome, microbiome, metabolome, and other biological and phenotypic parameters. Lifelines-DEEP is an example of a ‘next- generation’ population cohort study—in which multiple molecular data levels are combined with observational research methods ¹⁾

Subcohort

From April 2012 to August 2013, all adult participants registered at the Lifelines location in Groningen were invited to participate in Lifelines-DEEP, in addition to the regular Lifelines programme. Inclusion stopped when the target group size of n=~1500 was reached.

Protocol

From the ~1500 DEEP-participants, a variety of additional data and samples were collected, including:

3 additional blood samples, drawn during 1A Visit 2 (n = ~1,400)
exhaled air, collected during 1A Visit 2 (n = ~1,400)
faecal samples, collected by participants (n = ~1250) at home, immediately frozen at −20°C, and (within 2 weeks after 1A Visit 2) picked up from their homes and transported to the Lifelines lab on dry-ice by research assistants. On arrival at the research location, the faecal samples were immediately stored at −80°C.
Self-reported gastrointestinal health (the Rome III questionnaire) (n = ~1200)
Description of stool samples (the Bristol Stool Scale).

Data availability

Type	N	Available?
Genomics (cytosnp)	+/- 1400	Yes
Methylation	+/- 800	Yes
RNAseq	+/- 1300	Yes
MGS	+/- 1200	Yes*
16S	+/- 1000	Yes*
Metabolomics	+/- 1400	Yes
Cytokines	+/- 1100	Yes
Proteomics	+/- 1100	Yes
WES	+/- 1000	Yes

*Raw data is available; the processed data is not available yet

Genomics

Genotyping of genomic DNA was performed using both the HumanCytoSNP-12 BeadChip15 and the ImmunoChip, a customised Illumina Infinium array.16 Genotyping was successful for 1385 samples (CytoSNP) and 1374 samples (IChip), respectively. First, SNP quality control was applied independently for both platforms. SNPs were filtered on MAF above 0.001, a HWE p value >1e−4 and call rate of 0.98 using Plink.17 The genotypes from both platforms were merged into one data set. For genotypes present on both platforms, the genotypes were put on missing in the case of non-concordant calls. After merging, SNPs were filtered again on MAF 0.05 and call rate of 0.98, resulting in a total of 379 885 genotyped SNPs. Next, these data were imputed based on the Genome of the Netherlands (GoNL) reference panel.18–20 The merged genotypes were prephased using SHAPEIT221 and aligned to the GoNL reference panel using Genotype Harmonizer22 in order to resolve strand issues. The imputation was performed using IMPUTE223 V.2.3.0 against the GoNL reference panel. We used a MOLGENIS compute24 imputation pipeline to generate our scripts and monitor the imputation. Imputation yielded 8 606 371 variants with Info score ≥0.8. In addition, HLA type was established via the Broad SNP2HLA imputation pipeline.25

immunochip
Cytochip
Ex-Seq
ExomeChip
WGS- GoNL
Imputation HRC
Imputation HLA

Methylation

Exerpt from Shah et al. (2015)²⁾. We isolated total DNA from EDTA tubes and profiled genome-wide methylation using the Infinium HumanMethylation450 BeadChip, as previously described.11 In short, 500 ng of genomic DNA was bisulfite modified and used for hybridisation on Infinium HumanMethylation450 BeadChips, according to the Illumina Infinium HD Methylation protocol.
Details of DNA extraction and methylation profiling are described elsewhere.19 Probe QC, background correction, color correction, and normalization were performed with a custom pipeline based on the pipeline by Tost and Touleimat.24 All methylation probes were re-mapped to the human genome (hg37, UCSC Genome Browser),25 and both poorly mapping probes and probes with a SNP in the single-base extension side (according to GoNL26) were removed in the same step. Data were normalized with DASEN.

Transcriptomics / RNAseq

Exerpt from Tighelaar et al. (2015). Genome-wide transcription was assessed to measure genome wide gene expression. RNA was isolated from whole blood collected in a PAXgene tube using PAXgene Blood miRNA Kit (Qiagen, California, USA). The RNA samples were quantified and assessed for integrity before sequencing. Total RNA from whole blood was deprived of globin using GLOBINclear kit (Ambion, Austin, Texas, USA), and subsequently processed for sequencing using Truseq V.2 library preparation kit (Illumina Inc, San Diego, California, USA). Illumina HIseq2000 was used for paired-end sequencing of 2×50 bp, i.e. fragments of 50 base pairs in length were sequenced in both directions. Ten samples were pooled per lane. Finally, read sets per sample were generated using CASAVA, retaining only reads passing Illumina’s Chastity Filter for further processing. On average, the number of raw reads per individual after QC was 44.3 million. After adapter trimming, the reads were mapped to human genome build 37 using STAR (https://code.google.com/p/rna-star/). Of these, 96% of reads were successfully mapped to the genome. Transcription was quantified on the gene and meta-exon level using BEDTools (https://code.google.com/p/bedtools/) and custom scripts, and on the transcript level using FluxCapacitor (http://sammeth.net/confluence/display/FLUX/Home).

Microbiome / 16S analysis

Exerpt from Tighelaar et al. (2015). Faecal samples were collected in order to study the gut microbiome. Gut microbial composition was assessed by 16S rRNA gene sequencing of the V4 variable region on the Illumina MiSeq platform according to the manufacturer's specifications.27 Reads were quality filtered and taxonomy was inferred using a closed reference Operational Taxonomic Unit-picking protocol against a preclustered GreenGenes database, as implemented by QIIME (V.1.7.0 and V.1.8.0).28 ,29 Moreover, faecal aliquots were stored for future analysis of GI-health-related biomarkers.

Metagenome Shotgun analysis

Exerpt from Zhernakova et al. (2016)³⁾. The gut microbiome was analyzed using paired-end metagenomic shotgun sequencing (MGS) on a HiSeq2000, generating an average of 3.0 Gb of data (about 32.3 million reads) per sample. After excluding 44 samples with low read counts, 1,135 participants (474 males and 661 females) remained for further analysis. We tested 207 factors with respect to the microbiomes of these participants: 41 intrinsic factors of various physiological and biomedical measures, 39 self-reported diseases, 44 categories of drugs, 5 categories of smoking status and 78 dietary factors (fig. S1 and table S1).

Metabolomics

Exerpt from Tighelaar et al. (2015). We determined metabolites in exhaled air and blood. Metabolites from exhaled air were measured by a combination of gas chromatography and time-of-flight mass spectrometry (GC-tof-MS), as described previously. In short, the exhaled air sample was introduced in a GC that separates the different compounds in the mixture. Subsequently, the compounds were introduced into the MS to detect and also to identify the separated volatile organic compounds. The metabolites in plasma were measured using the nuclear MR (NMR) method, as described by Kettunen et al.

Proteomics

Exerpt from Zhernakova et al (2018)⁴⁾: To estimate the effect of host genetics on protein levels, we first performed a local protein quantitative trait loci (cis-pQTL) analysis by testing SNPs located within 250 kb of the genes coding for the 92 proteins. This yielded 129 significant cis-pQTLs for 66 proteins at genome-wide false discovery rate (FDR) 0.05 level (Supplementary Table 2 and Supplementary Fig. 2). We then regressed out the cis-pQTL effects and conducted a trans-pQTL mapping in a genome-wide manner and then separately on disease- and trait-associated SNPs only, which together yielded 85 independent trans-pQTLs for 36 proteins (Supplementary Table 3 and Supplementary Fig. 3). Of these, 19 cis-pQTLs and 74 trans-pQTLs were associated with complex traits and diseases, including 10 cis- and 7 trans-regulated proteins known to be relevant for CVD (Supplementary Tables 2 and 3). In addition, we separately assessed associations to 422 putative CVD-associated SNPs7 and detected pQTL associations for 14 proteins (Supplementary Table 4 and Supplementary Note 1). These pQTLs could point to driver genes in CVD (Supplementary Note 2); for example, as can be seen in the pleotropic trans-pQTLs effect observed at the KLKB1 gene (Supplementary Fig. 4).
Next, we examined the power of our study by assessing the replication rate of previously reported pQTLs8,9,10,11 and identified a 95% replication rate for cis effects and an 88% replication rate for trans effects, all with the same allelic direction (Supplementary Table 5). Our data also revealed novel pQTL associations including 36 cis-pQTLs for 25 proteins and 48 trans-pQTLs for 27 proteins (Supplementary Tables 2 and 3).
We found that only 64% of cis-pQTLs had at least one corresponding significant cis-eQTL, and 76% of these had the same allelic direction in blood from the same individuals or in other tissue types from the GTEx project12 (Supplementary Note 3, Supplementary Tables 2 and 6, and Supplementary Fig. 5). In contrast, none of the 85 trans-pQTLs were detectable at expression level, but this may be due to the power issue as the effect sizes of trans-eQTLs are known to be very modest. Despite this, our data do provide evidence that a large amount of trans-regulation can happen at translation- or protein-level; for example, through regulation of translational rate and protein secretion to blood, through post-translational modification, or through protein–protein interactions (PPIs), and these trans effects are not necessarily detectable at transcription level.

Publications using DEEP-data

Aguirre-Gamboa R et al. (2016) Differential Effects of Environmental and Genetic Factors on T and B Cell Immune Traits. Cell Rep. 2016;17(9):2474-2487.
Bonder MJ et al. (2017) Disease variants alter transcription factor levels and methylation of their binding sites. Nat Genet. 2017;49(1):131-138.
Dekkers KF et al. (2016) Blood lipids influence DNA methylation in circulating cells. Genome Biol. 2016;17(1):138.
Imhann F et al. (2016) Proton pump inhibitors affect the gut microbiome. Gut. 2016;65(5):740-748.
Imhann F et al. (2017) The influence of proton pump inhibitors and other commonly used medication on the gut microbiota. Gut Microbes. 2017;8(4):351-358.
Imhann F et al. (2018) Interplay of host genetics and gut microbiota underlying the onset and clinical presentation of inflammatory bowel disease. Gut. 2018;67(1):108-119.
Kurilshikov A et al (2019). Gut Microbial Associations to Plasma Metabolites Linked to Cardiovascular Phenotypes and Risk: A Cross-Sectional Study. Circ Res. 124(12): 1808-1820.
Mohajeri MH et al. (2018) The role of the microbiome for human health: from basic science to clinical applications. Eur J Nutr. 57(Suppl 1): 1–14
Mujagic Z et al. (2016) A novel biomarker panel for irritable bowel syndrome and the application in the general population. Sci Rep. 2016;6:26420.
Netea MG et al. (2016) Understanding human immune function using the resources from the Human Functional Genomics Project. Nature Medicine 22(8):831-833.
Ricaño-Ponce I et al. (2016) Refined mapping of autoimmune disease associated genetic variants with gene expression suggests an important role for non-coding RNAs. Journal of Autoimmunity 68:62-74.
Slieker RC et al. (2016) Age-related accrual of methylomic variability is linked to fundamental ageing mechanisms. Genome Biol. 2016;17(1):191.
Tigchelaar EF et al. (2016) Gut microbiota composition associated with stool consistency. Gut. 2016;65(3):540-542.
Valles-Colomer M et al. (2019) The neuroactive potential of the human gut microbiota in quality of life and depression. Nature Microbiology 4(4):623-632.
van der Meulen TA et al. (2018) Shared gut, but distinct oral microbiota composition in primary Sjögren’s syndrome and systemic lupus erythematosus. Journal of Autoimmunity 97 (77-87).
van Dongen J et al. (2016) Genetic and environmental influences interact with age and sex in shaping the human methylome. Nat Commun. 2016;7:11115.
van Iterson M et al. (2017) Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol. 18(1):19.
Vila AV et al. (2018) Gut microbiota composition and functional changes in inflammatory bowel disease and irritable bowel syndrome. Science Translational Medicine 10(472):eaap8914.
Wahl S et al. (2017) Epigenome-wide association study of body mass index, and the adverse outcomes of adiposity. Nature 541(7635):81-86.
Wijmenga C, Zhernakova A. (2018) The importance of cohort studies in the post-GWAS era. Nat Genet. 2018;50(3):322-328.
Zhernakova DV et al. (2019) Identification of context-dependent expression quantitative trait loci in whole blood. Nature Genetics 49(1):139-145.

Requesting access

¹⁾

Tigchelaar EF et al. (2015) Cohort profile: LifeLines DEEP, a prospective, general population cohort study in the northern Netherlands: study design and baseline characteristics. BMJ Open 5(8): e006772

²⁾

Shah S et al. (2015) Improving Phenotypic Prediction by Combining Genetic and Epigenetic Associations. Am J Hum Genet. 97(1):75–85

³⁾

Zhernakova A et al. (2016) Population-based metagenomics analysis reveals markers for gut microbiome composition and diversity. Science 352(6285): 565-569

⁴⁾

Zhernakova DV et al. (2018) Individual variations in cardiovascular-disease-related protein levels are driven by genetics and gut microbiome. Nat. Gen. 50(11): 1524-1532

Table of Contents