morphdb.hu – Hungarian lexical database and morphological grammar

morphdb.hu is an open source morphological database of Hungarian, consisting of a lexicon and morphological grammar that are based on well-founded theoretical decisions.

morphdb.hu is described in the formal representation form of hunlex, an offline resource compiler which offers a linguistically motivated morphological description language and allows for principled, flexible maintenance and extension of resources.
morphdb.hu thus provides — with the help of hunlex and hunmorph — primary language resources for spell-checking, stemming, morphological analysis and numerous other annotation tasks.

 

morphdb.hu is an attempt to reuse and unify various existing morphological resources representing several decades of labour by linguists.

The base lexicon is a result of compiling the words of three existing dictionaries:

Each of these dictionaries contained morphological information about their entries, however coded in different ways using rather different concepts of morphological information. The main difficulty in merging these resources was to harmonize and revise their information content as relevant to our morphological grammar with minimal information loss.

The database in numbers

The table below shows the number of words for each POS-category. Based on the morphological description Hunmorph is capable of analyzing more than 4 000 000 word forms.

POS tag POS category number of entries
NOUN noun 88026
ADJ adjective 17514
VERB verb 12549
ADV adverb 1932
UTT-INT interjection 498
CONJ conjunction 258
NUM numeral 209
DET determiner 164
POSTP postposition 146
PREV preverb 132
ONO onomatopoeic 96
PUNCT punctuation 28
PREP preposition 14
ART article 2

 

Download

cvs -d :pserver:anonymous:anonymous@cvs.mokk.bme.hu:/local/cvs co lexicons/morphdb.hu )

Documentation

The above resources contain the documentation. We have presented morphdb.hu in two articles

[in English]
Viktor Trón, Péter Halácsy, Péter Rebrus, András Rung, Eszter Simon, and Péter Vajda (2006) morphdb.hu: Hungarian lexical database and morphological grammar

[in Hungarian]
Viktor Trón, Péter Halácsy, Péter Rebrus, András Rung, Eszter Simon, and Péter Vajda (2005) morphdb.hu: magyar morfológiai nyelvtan és szótári adatbázis

The annotation system

The system of KR codes used by morphdb and other projects is documented here

Mailing list

Feel free to ask any questions concerning the database: hunmorph mailing list

 

Contributors

Development of morphdb.hu has begun during the szószablya project in The Media Research center of the Budapest University of Technology and Economics.

The developers are: Viktor Trón (MOKK), Péter Halácsy (MOKK), Péter Rebrus (Research Institute for Linguistics, HAS), András Rung (MOKK), Eszter Simon (Department of Cognitive Science, BUTE), Péter Vajda (Research Institute for Linguistics, HAS), Dániel Szeredi and Dániel Varga (MOKK). Many other people helped us though. Special thanks goes to András Kornai and László Németh.

The Magyarispell dictionary was developed by László Németh. The electronic version of László Elekfi’s Dictionary of Hungarian Inflections was made available to us by the Research Institute for Linguistics.

Further thanks goes to Hungarian Telecom for their financial and infrastructural support

Comments are closed.