Lessons Learned in Text Document Classification

Dieter Merkl, Institut fuer Softwaretechnik, Technische Unversitaet Wien
Email: dieter@ifs.tuwien.ac.at


Abstract:

Text archives may be regarded as an almost optimal application arena for unsupervised neural networks. This because many of the operations computers have to perform on text documents are classification tasks based on noisy patterns. As a natural result, an ever increasing number of research reports concerned with that type of application appeared in literature. In this paper we argue in favor of paying more attention to the fact that text archives lend themselves naturally to a hierarchical structure. We take advantage of this fact by using a hierarchically organized network built up from independent self-organizing maps in order to enable the true establishment of a document taxonomy.

Paper in PostScript


WSOM'97