Research of the Probabilistic Machine Learning Group

We develop new methods for probabilistic modeling, Bayesian inference and machine learning. Our current focuses are:

  • Agile Probabilistic AI
  • Collaborative decision making and design with AI
  • Multi-agent RL-based user modeling
  • Bayesian Deep Learning
  • Privacy-preserving and Fair AI
  • Simulator-based Inference
  • Applications

  • These topics match the research programs of FCAI which we contribute to.

    Agile Probabilistic AI

    We develop and improve workflows for probabilistic modelling. Our core research is on designing computational diagnostics, devising methods and putting together AI tools that are interactive and can be generally applicable to iterative Bayesian model building workflows (see also https://fcai.fi/agile-probabilistic).
    Our tools are modular, support explainability, and are developed independently of domain-specific applications. Our research lends itself to a wide range of modelling scenarios, such as Bayesian hierarchical models, decision making, causal models, Gaussian processes, time series analysis, and other probabilistic machine learning methods.
    We actively contribute to popular probabilistic programming frameworks such as Stan and PyMC , and make our work widely available as part of free and modular open source software, including: projpred , loo , ArviZ , Bambi , bayesplot , priorsense , kulprit , cmdstanr and VAI-Lab .

    Representative Publications

    Collaborative decision making and design with AI

    Most machine learning systems operate with us humans, to augment our skills and assist us in our tasks, such as decision-making, modeling, or design. In order for an AI agent to efficiently assist human users, it needs to learn a user model which takes into account their goals, plans, preferences and biases. These factors are latent and non-stationary, and it is challenging to infer them while requiring minimal effort from the user. To best elicit expert knowledge, we develop probabilistic strategies (e.g., active learning, Bayesian experimental design, Bayesian optimization) and associated inference techniques. This research contributes directly to the FCAI Interactive AI Virtual Laboratories.

    centered image

    Representative Publications

    Multi-agent RL-based user modeling

    To create collaborative AI systems we need models of the users these systems will interact with. We develop user models that sufficiently capture human decision-making to be useful. Such models allow us to train AI agents that can understand and collaborate with human agents. These models are formulated using multi-agent reinforcement learning theory and learned from observed behaviour. The multi-agent formulation allows us to capture strategic behavior in humans (in which people actively try to influence another agent's inferences about themselves) by creating user models that model the user's beliefs about the AI system it is interacting with. Further, we base our user models on grounded cognitive theories, which allows us to minimize the amount of human interaction data needed to train them, and cope with biases and cognitive limitations that influence human behavior.
    Our research further tackles methods needed to make these user models practically usable. A user model's parameters can be non-stationary yet they must be inferred from a limited amount of interaction data. We develop inference algorithms and surrogates based on amortization techniques that can help scale complex user models to practical application scenarios. See also https://github.com/AaltoPML/atom-team.

    Representative Publications

    Bayesian Deep Learning

    Our goal is to develop principled Bayesian deep learning methods. These include:

    Representative Publications

    Privacy-preserving and Fair AI

    Figure : In privacy-preserving learning, our aim is to learn important characteristics about the distribution of the data while protecting the anonymity of data subjects.

    Machine learning models are capable of unintentionally capturing sensitive information underlying the training data, leading to disclosure of sensitive personal information - a privacy concern - as well as unfair decisions in automated processes - a fairness concern.
    Privacy: We combine differential privacy, the prevailing formalism for anonymity, with Bayesian inference to develop powerful techniques for probabilistic modeling with rigorous privacy guarantees. We study the theoretical foundations and provide practical implementations for modern statistical inference under differential privacy, such as the software package Twinify, for creating and releasing anonymised synthetic twins of sensitive data sets.
    Fairness: Since fairness is a contextual problem, we use a combination of probabilistic user modeling techniques to develop practical tools and intelligent agents which help human modelers develop fairer models within a given context.

    Representative Publications

    Simulator-based Inference

    Simulator-based inference (SBI) methods, or likelihood-free inference, such as approximate Bayesian computation (ABC) have gained popularity across scientific fields, as they enable statistical inference with complex, implicit models that only exist as computer simulators.
    We focus on developing simulation-based inference methods that tackle challenging problems such as computationally costly, and misspecified simulators. For instance, we improve the robustness and sample efficiency of SBI methods by (a) eliciting knowledge from domain experts, and (b) exploiting structural and temporal information in the data. Additionally, we address these problems in multiple fields such as cognitive science, population genetics, neuroscience, radio propagation, and epidemiology, among others.
    We develop the Engine for Likelihood-free Inference (ELFI), an open-source Python software library that is designed to be modular and extensible. It provides a wide range of simulator-based algorithms and convenient syntax for defining inference.

    Representative Publications

    Applications

    We work closely with researchers from other fields and industry partners to pursue breakthroughs in challenging real-world applications. Our contributions tackle central problems in healthcare, drug design, cognitive science, distributed computing, materials science, telecommunications, and robotics and autonomous systems. Aligned with the FCAI Virtual Labs initiative, our recent efforts mainly split into the following strands:
    1. Health, genomics, and personalized medicine: We work with datasets from nationwide biobanks and electronic health records to support and improve the diagnosis and prevention of diseases. We are working with collaborators from the INTERVENE and FinRegistry projects.
    2. Drug discovery: In collaboration with pharma companies, we aim to speed up and reduce failure rates in the drug discovery pipeline. We combine ML models with expert knowledge to improve molecule generation, molecular property prediction/optimization, and molecular dynamics simulation. We are also part of the EU-wide AIDD consortium.
    3. Brain imaging: Our work on identifying features in MEG and EEG data with Bayesian methods helps to diagnose cognitive impairments such as dementia, which we investigate with our partners in the AI-Mind consortium.

    Representative Publications