(aside image)

Proactive Information Retrieval and Multimodal Interfaces

Our active research topics are

Introduction

Proactive systems anticipate the user's intentions and actions, and utilize the predictions to provide more natural and efficient user interfaces. Successful proactive systems are based on generalization from past experience in similar contexts using multimodal observations. Generalization requires suitable powerful stochastic models and a collection of data about relevant past history to learn the models. Evidence about users' intentions is collected both from explicit actions and implicit gaze patterns and other biofeedback signals.

Selected images and videos

Inferring interest from gaze patterns

An important part of a proactive system to be able to infer interests of the user. We study possibilities to extract relevant information from implicit gaze patterns, measured with modern eye-tracking equipment. During complex tasks, such as reading, attention approximately lies on the location of the reader's gaze. Therefore eye movements should contain information, although very noisy, on the reader's interests.

  • Intention from gaze patterns. We have used discriminative Hidden Markov models to detect different processing states in the tasks of simple word search, question-answering, and finding the most interesting topic. The model detects, for example, switches between reading and scanning the text, which in turn helps in predicting the intention of the user.

  • Implicit queries from gaze patterns. In textual information retrieval tasks, gaze can help in predicting which text snippets are relevant. Furthermore, we have used gaze to infer implicit queries the user has not formulated or cannot formulate explicitly. Eye movements collected while the user is reading retrieval results are informative of what the user was after, leading to an improved estimate of the query.

Specific Projects

Representative Publications

Eye-movement enhanced image retrieval

In image retrieval tasks, we use eye movements as implicit relevance feedback for images or for parts of images. Gaze-based inference of relevance approaches can complement or replace explicit feedback in content-based image search. Our work has demonstrated that gaze provides useful information also for media types that are less structured than text.

  • Gaze-controlled search interfaces. We have developed novel gaze-based interfaces for image retrieval. The purpose is both (i) to create interfaces that can be used without explicit control devices, and (ii) to obtain information from gaze. While gaze is informative of the user's interests in all settings, it is possible to create interfaces that provide more information compared to standard explicit feedback.

  • Inferring relevant image regions from gaze patterns. To get beyond the standard simplified binary image-level relevance assumption, we have developed a model for predicting which image regions are relevant. A novel Bayesian mixture model learns task-specific regions from gaze data collected from users performing different tasks.

Collaborators

Specific Projects

Representative Publications

Contextual information interfaces

Contextual information interfaces provide access to information that is relevant in the current context. They use sensory signals, such as gaze patterns, to track the user's foci of interest and what is relevant in the current context, and to predict what kind of information the user would need at the present time. The information is retrieved from databases and presented in a non-intrusive manner. Main challenges are extraction of context from visual and sensory data, construction of adaptive machine learning models that are able to utilize heterogeneous context cues to predict relevance. Novel statistical machine learning methods are used for multimodal contextual information retrieval.

  • Real world information retrieval. We are interested in means of efficiently retrieving information relevant for the current real world context and presenting it in augmented reality on a wearable or hand-held display. Relevance of the surrounding objects and the implicit information need concerning them is inferred from gaze and speech. Retrieved textual annotations are augmented to the view and become part of the context the user can attend to. As a pilot application scenario, we have implemented a guide that displays relevant information to a visitor in a university department.

  • Predicting relevant objects from gaze. To predict about which real world objects the user would like to know more about we use gaze patterns to infer their relevance. Gaze can be measured with wearable eye trackers. Statistical models are required to extract informative signal from noisy gaze trajectories.

Collaborators

Specific Projects

Representative Publications

Inferring cognitive state from multimodal measurements

Being aware of the user's emotional and mental state could deliver new opportunities for implementing proactivity. To this end, and an interesting problem in its own right, we are working on statistical models for inferring the cognitive state of the user from physiological state of the user that is measured by several nonobtrusive sensors such as an accelerometer attached to the nape, or an eye tracker embedded to the monitor.

  • Inferring cognitive state from physiological measurements. We are searching for statistical machine learning models to infer the emotional and mental state of users based on physiological measurements such as heart beat, body movement, brain activity (through one line EEG), and eye movements.

Collaborators

Specific Projects

Full publication list of the research group