Laboratory of Computer and Information Science / Neural Networks Research Centre CIS Lab Helsinki University of Technology

Morpho Challenge 2010 - Semi-supervised and Unsupervised Analysis

Part of the EU Network of Excellence PASCAL2 Challenge Program. Participation is open to all.

The Challenge results are now available as well as the Challenge workshop program and slides.

The objective of the Challenge is to design a statistical machine learning algorithm that discovers which morphemes (smallest individually meaningful units of language) words consist of. Ideally, these are basic vocabulary units suitable for different tasks, such as text understanding, machine translation, information retrieval, and statistical language modeling.

The scientific goals are:

Morpho Challenge 2010 is a follow-up to our previous Morpho Challenge 2005, 2007, 2008 and 2009. The task in 2010 is similar to 2009, where the aim was to find the morpheme analysis of the word forms in the data. As a new task we will provide a possibility for semi-supervised learning using the available linguistic gold standard morpheme analysis.

Participation in the previous challenges is by no means a prerequisite for participation in Morpho Challenge 2010. Everyone is welcome and we hope to attract many participating teams. The results will be presented in a workshop. Please read the rules and see the schedule. The datasets are available for download. Submit your analyses (result files) by sending them by email to the organizers, or by indicating a location where the organizers can download your files. Remember also to describe your algorithm in a paper. Please read the formatting instructions in rules.

If you plan to participate in Morpho Challenge, please contact the organizers using the email address in contact and ask to be added in our mailing list. We will use this mailing list to provide news about the tasks, data and evaluations.

We are looking forward to an interesting challenge!

Mikko Kurimo, Krista Lagus, Sami Virpioja and Ville Turunen
Adaptive Informatics Research Centre, Aalto University School of Science and Technology (previously known as Helsinki University of Technology)
The organizers


Mathias Creutz and Krista Lagus (2005). Unsupervised Morpheme Segmentation and Morphology Induction from Text Corpora Using Morfessor 1.0. Publications in Computer and Information Science, Report A81, Helsinki University of Technology, March.
[ Article (PDF) ]

Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, Mikko Kurimo, Janne Pylkkönen, and Sami Virpioja (2006). Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language, Volume 20, Issue 4, October, pages 515–541.
[ Article (PDF) ]

Sami Virpioja, Jaakko J. Väyrynen, Mathias Creutz, and Markus Sadeniemi (2007). Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. In Proceedings of the Machine Translation Summit XI, pages 491–498.
[ Article (PDF) ]

Mikko Kurimo, Sami Virpioja, Ville T. Turunen, Graeme W. Blackwood, and William Byrne. Overview and results of Morpho Challenge 2009. In Working Notes for the CLEF 2009 Workshop, Corfu, Greece, September 2009.
[ Article (PDF) ]


You are at: CIS → Unsupervised Morpheme Analysis -- Morpho Challenge 2010

Page maintained by webmaster at, last updated Monday, 26-Sep-2011 17:23:46 EEST