Média Oktató és Kutató Központ | Category Archive

Eszközök

huntag

Huntag – a sequential tagger for NLP using Maximum Entropy Learning and Hidden Markov Models Introduction Huntag can perform any kind of supervised sequential sentence tagging tasks. It has been used for NP chunking, Named Entity Recognition, and clause chunking. The flexibility of Huntag comes from the fact that it will generate any kind of [...]

Read full story • Comments Off

Hungarian Webcorpus

With over 1.48 billion words unfiltered (589m words fully filtered), this is by far the largest Hungarian language corpus, and unlike the Hungarian National Corpus (125m words), it is available in its entirety under a permissive Open Content license. The Hungarian webcorpus was crawled in the winter of 2003 as part of the WordSword project [...]

Read full story • Comments Off

hunpos – HMM part-of-speech tagger

Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants. the project has moved to Google Code:http://code.google.com/p/hunpos/ Features Free and open source, even for commercial use. For languages with more complex morphologies, HMM tagging could be quite competitive with the current generation of learning algorithms applying e.g. SVM and [...]

Read full story • Comments Off

hunpars – szintaktikai elemző magyar nyelvre

A Hunpars szintaktikai elemző magyar nyelvre. Bemenetként egy szövegfile-t kap mondatokkal, kimenetként pedig megadja a mondatok szintaktikai fáját egy egyszerű zárójelezéses jelölésben és GrahpViz dot nyelvű file-okban. Szoftver követelmények Python 2.4 vagy újabb Hunmorph morfológiai elemző Graphviz (http://www.graphviz.org) Használat hunmorph morfológiai elemzővel (ajánlott) Ebben az esetben a mondatok szavait a hunmorph segítségével morfológiai elemzésre kerülnek, [...]

Read full story • Comments Off

hunmorph – morphological analyzer

Hunmorph is an open source tool and programming library for spell-checking, stemming and morphological analysing of agglutinative, german and other languages. Mailing list Our research group has been working on a Hungarian morphological analyzer since 2003. First we extended the codebase of MySpell, a reimplementation of the well-known Ispell spellchecker, yielding a generic word analysis [...]

Read full story • Comments Off

Hunglish Corpus

Hunglish Corpus Version 2.0 The Hunglish Corpus is a free sentence-aligned Hungarian-English parallel corpus of about 120 million words in 4 million sentence pairs. This is the Version 2.0 release of the Corpus, approximately doubling the size of the original 1.0 release from 2005. Download Search The Corpus can be downloaded from our [...]

Read full story • Comments Off

morphdb.hu – Hungarian lexical database and morphological grammar

morphdb.hu is an open source morphological database of Hungarian, consisting of a lexicon and morphological grammar that are based on well-founded theoretical decisions. morphdb.hu is described in the formal representation form of hunlex, an offline resource compiler which offers a linguistically motivated morphological description language and allows for principled, flexible maintenance and extension of resources. [...]

Read full story • Comments Off

hunalign – sentence aligner

The hunalign sentence aligner Introduction hunalign aligns bilingual text on the sentence level. Its input is tokenized and sentence-segmented text in two languages. In the simplest case, its output is a sequence of bilingual sentence pairs (bisentences). In the presence of a dictionary, hunalign uses it, combining this information with Gale-Church sentence-length information. In the [...]

Read full story • Comments Off

huntoken

huntoken – rule based tokenizer and sentence boundary detector for Hungarian (and English) texts.

Read full story • Comments Off

Nyelv