Unsupervised Morpheme Analysis -- Morpho Challenge 2009

This is a page of the previous Morpho Challenge 2009. The current challenge is Morpho Challenge 2010.

Unsupervised Morpheme Analysis -- Morpho Challenge 2009
Part of the EU Network of Excellence PASCAL Challenge Program and organized in collaboration with CLEF. Participation is open to all.

The objective of the Challenge is to design a statistical machine learning algorithm that discovers which morphemes (smallest individually meaningful units of language) words consist of. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval, and statistical language modeling.

The scientific goals are:

To learn of the phenomena underlying word construction in natural languages
To discover approaches suitable for a wide range of languages
To advance machine learning methodology

Morpho Challenge 2009 is a follow-up to our previous Morpho Challenge 2005, 2007 and 2008. The task of Morpho Challenge 2009 is similar to the Morpho Challenge 2008, where the aim was to find the morpheme analysis of the word forms in the data. For this challenge, new Machine Translation tasks are added (from Finnish to English and from German to English) to evaluate the performance of the morpheme analysis.

Participation in the previous challenges is by no means a prerequisite for participation in Morpho Challenge 2009. Everyone is welcome and we hope to attract many participating teams. The results will be presented in a workshop. Please read the rules and see the schedule. The datasets are available for download. Submit your analyses (result files) by sending them by email to the organizers, or by indicating a location where the organizers can download your files. Remember also to describe your algorithm in a paper. Please read the formatting instructions in rules.

If you plan to participate in Morpho Challenge, please contact the organizers using the email address in contact and ask to be added in our mailing list. We will use this mailing list to provide news about the tasks, data and evaluations.

The results from the evaluation runs are now in the Results page.
The Workshop was held in September 30, 2009

Mikko Kurimo, Sami Virpioja and Ville Turunen
Adaptive Informatics Research Centre, Helsinki University of Technology
The organizers

References

Mathias Creutz and Krista Lagus (2005). Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0. Publications in Computer and Information Science, Report A81, Helsinki University of Technology, March.
[ Article (PDF) ]
Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, Mikko Kurimo, Janne Pylkkönen, and Sami Virpioja (2006). Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language, Volume 20, Issue 4, October, pages 515–541.
[ Article (PDF) ]
Sami Virpioja, Jaakko J. Väyrynen, Mathias Creutz, and Markus Sadeniemi (2007). Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In Proceedings of the Machine Translation Summit XI, pages 491–498.
[ Article (PDF) ]
Mikko Kurimo and Matti Varjokallio (2008). Unsupervised morpheme analysis evaluation by a comparison to a linguistic Gold Standard – Morpho Challenge 2008. In Working Notes for the CLEF 2008 Workshop.
[ Article (PDF) ]
Mikko Kurimo and Ville Turunen (2008). Unsupervised morpheme analysis evaluation by IR experiments – Morpho Challenge 2008. In Working Notes for the CLEF 2008 Workshop.
[ Article (PDF) ]