Bayesian Two-Way Analysis of High-Dimensional Collinear Metabolomics Data


Tommi Suvitaival. Bayesian two-way analysis of high-dimensional collinear metabolomics data. Master's thesis, Helsinki University of Technology, Department of Information and Computer Science, October 2009.


Two-way experimental designs are common in bioinformatics. In this thesis, a new Bayesian model is proposed for the analysis of two-way data. The method also works for small sample-size data with a high number of features.

The data set is assumed to be divided into populations according to covariates, which in the case of a typical biological experiment are the health status, the gender, the medical treatment and the age of the individual. The proposed method is designed to estimate the effect of these covariates compared to the ground level of a control group of the data.

The method is based on the assumption that features of the data form groups that are highly collinear. This allows the use of a latent variable-based dimensionality reduction, which makes inference possible also for small sample-size data sets.

The method treats the data in a completely Bayesian way, which produces an estimate for the joint distribution of the model and the data, and marginal posterior distributions of all model parameters. This allows one to evaluate the significance and uncertainty of the results and to compare it to other models. Inference is carried out with Gibbs sampling.

The performance of the new method is demonstrated with a metabolomic data set by comparing lipidomic profiles from children who remain healthy to those who will later develop type 1 diabetes. In two separate studies, the effect of the disease and gender, and the effect of the disease and time, are estimated.


ANOVA, Bayesian modelling, factor analysis, hierarchical model, metabolomics, small sample-size

Suggested BibTeX entry:

    address = {Department of Information and Computer Science},
    author = {Tommi Suvitaival},
    month = {October},
    school = {Helsinki University of Technology},
    title = {Bayesian Two-Way Analysis of High-Dimensional Collinear Metabolomics Data},
    year = {2009},

PDF (959 kB)