40474 Split Compounds from GermaNet Available
We are happy to announce the availability of 40474 German nominal compounds from GermaNet release 8.0 that have been split into their constituent parts, i.e., modifier and head. This dataset has been constructed semi-automatically and all compound splits have been manually post-corrected.
The list of split compounds is freely available for download at
For many applications, it is helpful to have information about the parts of the compound, as usually the semantic interpretation is based on the meaning of its parts. What makes compound splitting for German a challenging task is the fact that compounding, which is a very productive word formation process in German, is not always simple string concatenation. It often involves the presence of intervening linking elements or the elision of word-final characters in the modifier constituent of a compound.
For more information about GermaNet, please consult the project website: http://www.sfs.uni-tuebingen.de/GermaNet/