High-throughput measurement techniques allow genome-wide studies of biological function. Gene expression, gene regulation, protein content, protein interaction, and metabolic profiles can be measured and combined with sequence information. The major challenge is to extract meaningful findings from large, noisy, high-dimensional, heterogeneous, and incomplete data sets. We develop new computational data analysis methods for taking benefit from prior knowledge and previous experiments in biomedical research.
|
One of the major challenges in molecular medicine is to translate inferences made on the molecular level, for example about metabolites, in model organisms into inferences about humans. Metabolomics is the study of the set of all metabolites found in a sample tissue. Metabolite concentrations are affected strongly by diseases and drugs, and hence they complement the genomic, proteomic, and transcriptomic measurements in an excellent way, in studies of the biological state of an organism. We have developed new computational modeling methods for mapping observed metabolomics data between model organisms and humans.
|
|
|
Large repositories of genome-wide measurement data pose the research question of how to systematically relate different data sets. Re-usage of data sets increases the statistical power of novel studies and opens up the possibility to put biological results in the context of previous studies. To complement keyword search functionalities provided by most repositories for retrieval of similarly annotated studies, we developed machine learning methods that relate gene expression studies through their actual measurement data, along with visualization tools that allow exploring and interpreting the results. In the REx project (Retrieval of Relevant Experiments), relevance is defined by a model of biology that is both data- and knowledge-driven. Collaborators
Representative Publications
|
|
|
A living cell is an extremely complex system, and hence integration of information from multiple sources is needed for revealing the true potential of the modern high-throughput measurement methods. Much of the data integration literature focuses either on well-targeted combinations of sources, such as using sequence-based regulators for explaining gene expression, or on well-focused prediction tasks such as predicting molecular interactions from several data sources. We have focused on knowledge discovery types of problems where the goal is to discover what is relevant in massive data sets by aiming to discover connections between different data sources, for instance chemoinformatic structure of drugs and cellular expression response profiles.
|
|