Software

The speech group has developed a C/C++ system for large vocabulary continous speech recognition. The system is language-independent, but it is particularly useful for languages like Finnish, Estonian or Turkish, in which the words consist of several morphemes. For testing the system contact Mikko Kurimo or try the www demo.

Tools for language modeling

VariKN language modeling toolkit can be used to create long-span n-gram language models. A direct link to the code is available here.

Morfessor can be used to decompose words into statistical morphemes.

Maximum entropy language models: SRILM extension can be used to train and apply maximum entropy (MaxEnt) language models to the SRILM toolkit.

Speech data

Isolated Finnish words spoken by 59 speakers, about 260 words each collected at Helsinki University of Technology in 1999. A direct link to the data is available here.