MLG 2008 banner

Invited speakers

Fan Chung
University of California, San Diego

Four graph partitioning algorithms

We will discuss four partitioning algorithms using eigenvectors, random walks, PageRank and their variations. In particular, we will examine local partitioning algorithms, which find a cut near a specified starting vertex, with a running time that depends on the size of the small side of the cut, rather than on the size of the input graph (which can be prohibitively large). Three of the four partitioning algorithms are local algorithms and are particularly appropriate for applications for massive data sets.

Thorsten Joachims
Cornell University

Structured Output Prediction with Structural SVMs

This talk explores large-margin approaches to predicting graph-based objects like trees, clusterings, or alignments. Such problems arise, for example, when a natural language parser needs to predict the correct parse tree for a given sentence, when one needs to determine the co-reference relationships of noun-phrases in a document, or when predicting the alignment between two proteins. In particular, the talk will show how structural SVMs can learn such complex prediction rules, using the problems of supervised clustering, protein sequence alignment, and diversification in search engines as application examples. Furthermore, the talk will present new cutting-plane algorithms that allows training of structural SVMs in time linear in the number of training examples.

Mohammad Mahdian
Yahoo! Research

Influence and Correlation in Social Networks

In many online social systems, social ties between users play an important role in dictating users' behavior. One of the ways this can happen is through social influence, the phenomenon that the actions of a user can induce his/her friends to behave in a similar way. In systems where social influence exists, ideas, modes of behavior, or new technologies can diffuse through the network like an epidemic. Therefore, identifying and understanding social influence is of tremendous interest from both an analysis (e.g., predicting the future of the system) and a design (e.g., designing viral marketing strategies) point of view.

In this talk, I will give a general overview of models for diffusion in social network, and then discuss the problem of identifying social influence in the data. This is a difficult task in general, since there are many other factors such as homophily or unobserved confounding variables that can induce statistical correlation between the actions of friends in a social network. Thus, distinguishing influence from those other factors is essentially the problem of distinguishing correlation from causality, a notoriously hard problem. Despite this, I will show how in an environment where the time stamp of the actions are observable, we can design simple statistical tests that distinguish between models of social influence and those that replicate the aforementioned sources of social correlation. I will sketch the proof of a theoretical justification of one of the tests, and present simulation results on randomly generated data and real tagging data from Flickr. The results exhibit that while there is significant social correlation in tagging behavior on this system, this correlation cannot be attributed to social influence.

Hannu Toivonen
University of Helsinki

Biomine search engine for probabilistic graphs

Biomine is a search engine prototype under development. It can be used to find biological entities that are (indirectly) related to given query entities, as well as to display and evaluate the relations. Biomine is based on an integrated index to a number of public biological databases. The representation is a probabilistic graph, where nodes correspond to biological entities (typically a record in a biological database) and edges to their relationships (typically a cross-reference between database records). Edges are annotated with probabilities that reflect the strength or the reliability of the relation. I will discuss research problems and challenges for search in such graphs.