Jaakko Peltonen, Janne Sinkkonen and Samuel Kaski.
Sequential Information Bottleneck for Finite Data.
In: Proceedings of the 21st International Conference on Machine Learning, Banff, Canada, 2004. Accepted for publication.
The sequential information bottleneck (sIB) algorithm clusters
co-occurrence data such as text documents vs. words. We introduce a
variant that models sparse co-occurrence data by a generative
process. This turns the objective function of sIB, mutual
information, into a Bayes factor, while keeping it intact
asymptotically, for non-sparse data. Experimental performance of
the new algorithm is comparable to the original sIB for large data
sets, and better for smaller, sparse sets.