DATA ANALYSIS

 

 

For data analysis, bioinformatics is a crucial step in data generation. We have several large servers and a team of experienced bioinformaticians to run state-of-the-art pipelines for our bio-informatic services. We dinstinguish data analysis for array genotyping data, methylation data, DNA  sequencing data, RNA sequencing data, and analysis for aiding in family studies.

 

Genotyping array data analysis

Clusterfile creation - If you wish to run a custom Illumina based array it is essential that a clusterfile is generated in order to produce accurate genotypes for your data. If you are using one of our standard arrays, this service is not required.

 

Merging datasets - If you have multiple datasets you want to analyze in one model (e.g., cases genotyped separately from controls) these datasets need to be merged into a single dataset. After merging of genotype datasets, extended QC and imputations are recommended to eliminate any batch effects.

 

Extended QC - After genotyping we standardly perform a technical QC. During the technical QC we only look if there

were no major technical errors that could affect your data. During the extended QC, your data is cleaned according to current industry standards providing you with a clean dataset to use for your analyses. This extended QC includes, but is not limited to, call rate and HWE filters, gender check, genetic ancestry check, familial relationships check and rare variant calling.

 

Clinical QC - A different version of extended QC. The same QC analysis is performed, with one major difference: We do not remove any samples during the gender check, genetic ancestry check and familial relationships check. Instead you receive files and our recommendations to perform this assessment yourself. The advantage of this is that the analysis can be performed in a much shorter time frame then is possible for the extended QC procedure, making it ideal for

clinical usage of genetic data.

 

Imputation - Using sequencing based reference panels (e.g., 1000G, HRC) as a reference population, we impute non genotyped variants into your dataset. This increases the number of variants in your dataset to approximately 40 million . We are also able to impute to the TOPMed imputation panel, however special legal limitations apply. Please

ask for more information.

 

HLA imputations - Using HLA TAPAS we perform specialized imputations on the HLA region. This takes genotype data to HLA typing up to a 4 digit resolution.

 

SNP extraction - Extracting genotypes from (imputed) datasets for use in candidate gene studies and/or polygenic risk scores. We do not generate the PRS for you during the SNP extraction, we only provide you with the data for doing so.

 

Genome Wide Association Analysis - We perform a GWAS analysis on your data. This necessitates you sharing phenotype data with us so we can perform the analysis. We also perform limited post GWAS analysis and recommend further analysis/databases for the client to investigate their associations further.

 

CNV analysis - Using PENN CNV, we estimate the presence of CNVs in your genotyped dataset

 

Complete study advice/help - If you need more help with your genetic study, please acquire for the possibility of complete study guidance. Off course we provide advice during our intake procedure, but for more extensive help we offer the guidance of one of our senior analyst throughout the analysis procedure. Ideal for PhD students with little experience in genetic analyses and no one in their own research group to provide this kind of guidance. This service is on a request basis only and our ability to provide this service depends on both your needs for help and our analysts’ availability.

 

Methylation array data analysis

Basic quality control - Quality control of the DNA methylation data is performed on study samples and CpG probes. Gender mismatches are identified and samples failing this QC step are flagged for exclusion.

 

Extended quality control - Raw DNA methylation data is background corrected and dye bias adjusted. Removal of failing samples and CpG probes as identified during the basic QC/pre processing steps is performed.

 

Cell type proportions - Blood cell type counts are estimated using Holtzman's estimation. Blood cell counts are essential variables to take along in your analysis of blood based methylation data.

 

Epigenetic age - Predicted Hannum and Horvath ages are computed.

 

Normalization - Methylation data is normalized so comparisons can better be made . Normalization is performed based on the type of data/study. Two different types of normalization approaches are implemented in our facility:

-- functional normalization (FN)

-- quantile normalization based on six categories (QN 6C)

 

DNA sequencing data analyses

 

RNA sequencing data analyses

 

Analyses for aiding in family studies

Advise for study design

Support with exome sequencing data analysis and filtering

 

 

For more information, please contact dr. Gaby van Dijk.