Research

Project A1

Quantitative genetic analysis using genome wide data

Lead Partners 2: Era7 (SME) 5: UJF-LECA (University)


State-of-the-Art, Problem and its Solution

The expression of complex traits such as multifactorial diseases and phenotypic traits involved in local adaptation is determined by genetic as well as environmental factors. For example, nutrition and lifestyle play important roles in the development of polygenic multifactorial diseases such as obesity and diabetes. Thus, a powerful approach to identifying genetic variants and predicting disease risk is to combine genetic and environmental data. Such an approach could also help detect spurious correlations generated by demographic history and geography. Implementing this idea requires the development of new statistical methods and computer algorithms and developments that use new NGS data. The huge amount of data yielded by NGS technologies must lead to new approaches based on cloud infrastructure and new tools able to extract more information and knowledge from genomic data.


The project will focus on two main activities:

  1. The design of algorithms, the development of software applications and the generation of data sets oriented to get more specific and local genomic information from genome wide data sets.
  2. The identification of genomic regions involved in local adaptation and evaluation of the ability of a species to adapt to changing environmental conditions.

Objectives

  1. Design of algorithms and computer methods able to extract relevant genomic information related with type II diabetes, obesity and metabolic syndrome from human genome and exome wide data. ESR1
  2. Generalize these methods to make them applicable to any species of interest and selection scenarios (eg local adaptation or artificial selection). This generalization will take into account the design and implementation of APIs to interact with third party applications designed to carry out complex analysis of genomic data. ESR1
  3. Develop statistical genetics methods to identify genomic regions subject to selection mediated by environmental factors. ESR2
  4. Integrate the general solution obtained in the second objective with the developments obtained in the third objective and carry out genome scans of freely available human databases to identify new candidate genomic regions associated with complex diseases. ESR1, ESR2


Methodology

Extraction of specific and local genomic information from genome-wide data sets


The algorithms and methods that will be developed will be based on the technologies used in Era7 Bioinformatics and specially on the use of IaaS (Infrastructure as a Service) based on Cloud Computing (Amazon Web Services) and Scala and Java programming languages.

Genome Scan methods


The genome scan methods will be based on the modelling of chromosomes of admixed populations as composed of segments that have been copied from chromosomes of donor populations. The ESRs will extend this model of neutral variants to make it applicable to the detection of selected genes. The underlying rationale is that chromosomal segments containing locally selected genes experience a rapid increase in frequency over a short enough time that recombination does not substantially break down the segment. The test for positive selection will therefore involve finding long chromosomal segments exhibiting large frequency differences among local populations; these will be associated with environmental factors by extending the approach of the Gaggiotti lab (B.4.1 & B.7).

ESRs training by research

  • ESR1 (Era7) will be trained by Era7 bioinformatics on mathematical applications to bioinformatics, Next Generation Sequencing problems, use of Amazon Web Services and Scala programming language. ESR1 (Era7) will develop applications and solutions that will be used by ESR2 in order to achieve the overall objectives of the project.
  • ESR2 (Grenoble) will be trained on population genetics and genomics, statistical genetics and bioinformatics. ESR2 will apply the developments carried out by ESR1to the identification of variants that are associated with risk factors and will further extend them to the study of local adaptation of non-model species. This will be achieved by developing a genome scan method that incorporates both genetic and environmental data.
  • Both ESRs will collaborate on bioinformatics and statistical developments and their applications to existing data sets.
Job Description 1
Job Description 2