Média Oktató és Kutató Központ

Eszközök

huntag

Sorry, this entry is only available in Magyar.

Read full story • Comments Off

With over 1.48 billion words unfiltered (589m words fully filtered), this is by far the largest Hungarian language corpus, and unlike the Hungarian National Corpus (125m words), it is available in its entirety under a permissive Open Content license. The Hungarian webcorpus was created in the winter of 2003 as part of the WordSword project [...]

Read full story • Comments Off

hunpos – HMM part-of-speech tagger

Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants. the project has moved to Google Code:http://code.google.com/p/hunpos/ Features Free and open source, even for commercial use. For languages with more complex morphologies, HMM tagging could be quite competitive with the current generation of learning algorithms applying e.g. SVM and [...]

Read full story • Comments Off

hunpars – szintaktikai elemző magyar nyelvre

A Hunpars szintaktikai elemző magyar nyelvre. Bemenetként egy szövegfile-t kap mondatokkal, kimenetként pedig megadja a mondatok szintaktikai fáját egy egyszerű zárójelezéses jelölésben és GrahpViz dot nyelvű file-okban. Szoftver követelmények Python 2.4 vagy újabb Hunmorph morfológiai elemző Graphviz (http://www.graphviz.org) Használat hunmorph morfológiai elemzővel (ajánlott) Ebben az esetben a mondatok szavait a hunmorph segítségével morfológiai elemzésre kerülnek, [...]

Read full story • Comments Off

hunmorph – morphological analyzer

Hunmorph is an open source tool and programming library for spell-checking, stemming and morphological analysing of agglutinative, german and other languages. Mailing list Our research group has been working on a Hungarian morphological analyzer since 2003. First we extended the codebase of MySpell, a reimplementation of the well-known Ispell spellchecker, yielding a generic word analysis [...]

Read full story • Comments Off

Hunglish Corpus

The Hunglish Corpus is a free sentence-aligned Hungarian-English parallel corpus of about 54.2 m words in 2.07 m sentences. Download Search Ask Read more The Corpus can be downloaded from our ftp server. If you have any questions don’t hesitate to ask via the hunglish-corpus mailing list. (the main language of the list is Hungarian, [...]

Read full story • Comments Off

morphdb.hu – Hungarian lexical database and morphological grammar

morphdb.hu is an open source morphological database of Hungarian, consisting of a lexicon and morphological grammar that are based on well-founded theoretical decisions. morphdb.hu is described in the formal representation form of hunlex, an offline resource compiler which offers a linguistically motivated morphological description language and allows for principled, flexible maintenance and extension of resources. [...]

Read full story • Comments Off

hunalign – sentence aligner

Introduction hunalign aligns bilingual text on the sentence level. Its input is tokenized and sentence-segmented text in two languages. In the simplest case, its output is a sequence of bilingual sentence pairs (bisentences). In the presence of a dictionary, hunalign uses it, combining this information with Gale-Church sentence-length information. In the absence of a dictionary, [...]

Read full story • Comments Off

huntoken

(Magyar) huntoken – rule based tokenizer and sentence boundary detector for Hungarian (and English) texts.

Read full story • Comments Off

huntag

Hungarian Webcorpus

hunpos – HMM part-of-speech tagger

hunpars – szintaktikai elemző magyar nyelvre

hunmorph – morphological analyzer

Hunglish Corpus

morphdb.hu – Hungarian lexical database and morphological grammar

hunalign – sentence aligner

huntoken

Language