(aside image)

In (Keronen et al., 2013), we propose to use a Gaussian restricted Boltzmann machine (GRBM, Cho et al., 2011) to extract features from the cross-correlation coefficients of stereo channels of speech. By simply plugging in the GRBM in the existing speech recognition pipeline (Keronen et al., 2012), we were able to improve the performance of keyword recognition in noisy environment.

References


Keronen, S., Cho, K., Raiko, T. Ilin, A., and and Palomäki, K..
Gaussian-Bernoulli restricted Boltzmann machines and automatic feature extraction for noise robust missing data mask estimation
In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013). May 2013. (to appear)

Cho, K., Ilin, A., and Raiko, T.
Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines.
In Proceedings of the International Conference on Artificial Neural Networks (ICANN 2011). Espoo, Finland. June 2011.

Keronen, S., Kallasjoki H., Remes U., Brown, G. J., Gemmeke J. F., and Palomäki, K..
Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment.
Computer Speech and Language, 27:3, 2013