Experiments on Distributional Categorization of Lexical Items with Self Organizing Maps

Gilles Bernard, CSAR Research Group, A.I Laboratory, Paris 8 University, 2 rue de la LibertŽ, 93526 Saint-Denis cedex 02
Email: featjym@cicrp.jussieu.fr


Abstract:

With experimental theoretical linguistics as ultimate end, we explore the experimental capacities brought forth by corpus statistics and neuromimetic techniques in order to reassess the distributionalist issue: how grammatical distribution of lexical items (position in the ÔsuperficialÕ constituent structure, morphology and grammatical modifiers) links to their semantic and grammatical properties? The simple experimentation protocol presented applies to any corpus and extends easily to other languages, without any precategorization of lexical items. Its stages are: (a) Extract information about the grammatical distribution of lexical items from the corpus (here, a 600,000 words corpus in French), using lists of grammatical items and some crude morphology; (b) Classify the grammatical contexts in order to produce input vectors; (c) Retrieve with Self Organizing Maps the relationships between classes of lexical items. Two series of experiments are presented: grammatical classifications of lexical items, and semantic classifications of nouns. Their results validate the methodology and argue for a unified theoretical approach of semantic and grammatical classes.


WSOM'97