Research

Project A2

Statistical methods for quantifying the effect of rare variants on phenotypic traits

Lead Partners A: PopGenTech (SME) 5: UJF Grenoble (University)

State-of-the-Art Problem and its Solution

Current methods for genome wide surveys are inefficient in cases where rare alleles contribute to a disease. PopGenTech, was co-founded by Nobel Laureate Dr Brenner to implement a method he has invented for interrogating pooled populations of genomes quickly and cheaply by NGS, which can detect such variants. Their strategy is hypothesis-driven – seeking to identify effects of rare variation at candidate genes rather than solely by association. In addition, the usual association tests are underpowered for detecting variants of lower frequency, so novel statistical approaches are required. These new methods need to exploit prior information to propose such candidates and distinguish their distribution from the effects due to the effects of history and selection.

Objectives

·Identify a robust and computationally efficient method suitable for PopGeneTech’s data to exploit prior information about loci.

·Implement a Bayesian algorithm to detect sets of candidate rare SNPs based on

pooled populations of genomes

·Enhance the value of PopGenTech’s newly developed molecular biology/biochemical workflows by applying the new approach to their output.

Applications from highly motivated and outstanding students with a Master’s degree (or equivalent) in one of the following disciplines will be considered: bioinformatics, statistics or machine learning, evolutionary genetics, theoretical population genetics. Students from related disciplines, such as physics, computer sciences or mathematics are also welcome to apply. Applicants with a genuine interest for interdisciplinary PhD education will be preferred.

Methodology

The ESR will develop machine-learning approaches to identify candidate sets of genes with a higher prior probability of affecting disease phenotypes or quantitatively varying traits. The principle of these approaches is to perform variable selection on predefined groups of variables in linear regression or classification models. Simulation modelling will provide an evaluation of gene set selection through these novel approaches.

ESRs training by research

The single ESR on this project will be based at UJF. The selected candidate will work in the TIMC-IMAG lab, and will interact with the computational and mathematical biology group. The PhD will involve an early 6-month placement with Population Genetics technologies in Cambridge (UK) where the student can get insights into experimental design, analysis, and bioinformatics processing of NGS genomic data.


Job Description 3