hunmorph – morphological analyzer

Hunmorph is an open source tool and programming library for spell-checking, stemming and morphological analysing of agglutinative, german and other languages.

Our research group has been working on a Hungarian morphological analyzer since 2003. First we extended the codebase of  MySpell, a reimplementation of the well-known Ispell spellchecker, yielding a generic word analysis library. At this point the development of the library has forked. Now the extended MySpell, called HunSpell, is part of the multilingual office suite. Hunmorph is the program tuned to morphological analysis.

The hunmorph framework is built from three components:

  • the ocamorph runtime analyzer is a language independent affix stripping implementation
  • is a lexical database and morphological grammar, which can be used by ocamorph (details can be found at
  • hunlex is an off-line resource management component, which complements the efficiency of our runtime layer with a high-level description language and a configurable precompiler.

The ocamorph analyzer uses non human readable language resources the so called aff/dic files (this is the same format used by’s MySpell). The aff/dic files are produced by the hunlex lexicon compiler from the resources. The aff/dic files are platform independent, and so they are published with this distribution: if you don’t want to modify the lexicon or the grammar, you don’t need to use hunlex to create them.



1. Download ocamorph source files from the public CVS

cvs -d co ocamorph

2. Download the precompiled Hungarian language resources from

If  you want to modify the resources, you need

3. the source

cvs -d co lexicons/

4. and the lexicon compiler

cvs -d co hunlex

Compile ocamorph


To compile ocamorph on Linux/OsX/Cygwin you need ocaml compiler version 3.08.02 or newer.

cd ocamorph

This compiles the ocamorph executable. You can install it by

sudo make install

If you want to install somewhere else, use:

mkdir YOUR_DIR



test your ocamorph

ocamorph --help

Build binary language resource

To run ocamorph you’ll need the language specific aff/dic resource files. If you don’t want to modify the resources you can find precompiled aff/dic files in lexicons/ To run your ocamorph type

echo "ablakot" | ocamorph --aff lexicons/ --dic lexicons/

and you get

> ablakot ablak/NOUN>

As you can see ocamorph reads from stdin and writes to stdout.

The warm up time of ocamorph can be very long: it builds a trie from the lexicon and minimalizes it. Ocamorph can save the minimalized trie to a binary (platform dependent) file. To build it type:

echo "ablakot" ocamorph --aff lexicons/ \ --dic lexicons/ --bin morphdb_hu.bin

After this you can use the binary resource

echo "ablakot" | ocamorph --bin morphdb_hu.bin

Running ocamorph with the bin file is much more faster but the bin file has to be recreated on every platform as well as if you recompile ocamorph. If you’d like to modify the lexicon or the grammar, please refer to lexicons/


Please cite

  1. the paper about hunmorph: Hunmorph : Open source word analysis
  2. and about Hungarian lexical database and morphological grammar

Comments are closed.