Allomorfessor: Towards Unsupervised Morpheme Analysis (2008)
AUTHORS:
Kohonen Oskar,
Virpioja Sami
,
Klami Mikaela
BOOKTITLE:
In Working Notes of the CLEF 2008 Workshop
URL:
http://www.clef-campaign.org/2008/working_notes/kohonen-paperCLEF2008.pdf
PDF:
pdf/kohonen-paperCLEF2008final.pdf
@inproceedings{ okohonen+virpioja+klami_2008, author = "Kohonen, Oskar and Virpioja, Sami and Klami, Mikaela", responsibleperson = "Kohonen,Oskar", title = "Allomorfessor: Towards Unsupervised Morpheme Analysis", url = "http://www.clef-campaign.org/2008/working_notes/kohonen-paperCLEF2008.pdf", booktitle = "In Working Notes of the CLEF 2008 Workshop", address = "Aarhus, Denmark", editors = "Alessandro Nardi and Carol Peters", flags = "AIRC COG public", year = "2008", date = "17-19 September", pdf = "kohonen-paperCLEF2008final.pdf", impactfactor = "B3", abstract = "Many modern natural language processing applications would benefit from automatic morphological analysis of words, especially when dealing with morphologically rich languages. Consequently, there has been an increasing amount of research on the task of unsupervised segmentation of word forms into smaller useful units, i.e. morphs or morphemes. The linguistic phenomenon of allomorphy, where one morpheme has several different surface forms, places limits on the quality of morpheme analysis achievable by segmentation alone. We extend the morphological segmentation method, Morfessor Baseline, to model allomorphy. Our unsupervised method discovers common base forms for allomorphs from an unannotated corpus. We evaluate the method by participating in the Morpho Challenge 2008 competition, where automatic morphological analyses of corpora in English, German, Turkish and Finnish are compared against a linguistic gold standard. Our method achieves high precision, but low recall, and therefore low F-measure scores. We conclude that our method currently undersegments, but that the main approach is promising." }