(aside image)

Our recent theoretical results on variational Bayesian learning are algorithmic improvements for variational inference from the years 2010 and 2011. They include two topics, namely a Riemannian conjugate gradient method and transformation of latent variables which are described in more detail here.

Our major earlier theoretical results include a study of the effect of posterior approximation in (Ilin and Valpola, 2005), more accurate linearization for learning nonlinear models, and treatment of partially observed values. These latter two topics are described in the following.

More accurate linearization for nonlinear models

Variational Bayesian learning of nonlinear models fundamentally reduces to evaluating statistics of the data predicted by the model. The model is a function of the parameters of the variational approximation of the posterior distribution. This is equivalent to evaluating statistics of a nonlinear transformation of the approximating probability distribution. In our earlier works on nonlinear models, we used a Taylor series approximation to linearize the nonlinearity. Unfortunately this approximation breaks down when the variance of the approximating distribution increases, and this leads to algorithmic instability.

For handling this problem, a new linearization method based on a more global Gauss-Hermite quadrature approximation was proposed in (Honkela and Valpola, 2005). The new linearization method yields significantly more accurate estimates of the cost of the model while being computationally almost as efficient.

Partially observed values

It is well-known that Bayesian methods provide well-founded and straightforward means for handling missing values in data. The same applies to values that are somewhere between observed and missing. So-called coarse data means that we only know that a data point belongs to a certain subset of all possibilities. So-called soft or fuzzy data generalises this further by giving weights to the possibilities. In (Raiko, 2004), different ways of handling soft data are studied in context of variational Bayesian learning.

References

A. Honkela and H. Valpola, "Unsupervised variational Bayesian learning of nonlinear models." In L. Saul, Y. Weiss, L. Bottou (Eds.), Advances in Neural Information Processing Systems 17, pp. 593-600, 2005, MIT Press. Pdf (118k)

A. Ilin and H. Valpola, "On the effect of the form of the posterior approximation in variational learning of ICA models." Neural Processing Letters 22(2), 2005, pp. 183-204. Publisher electronic edition

T. Raiko, "Partially observed values". In Proc. of the 2004 IEEE Int. Joint Conf. on Neural Networks (IJCNN2004), Budapest, Hungary, July 2004, pp. 2825-2830. Pdf (119k)

Figure caption

A summer view onto sea at the southern coast of Finland.