Huntag – a sequential tagger for NLP using Maximum Entropy Learning and Hidden Markov Models Introduction Huntag can perform any kind of supervised sequential sentence tagging tasks. It has been used for NP chunking, Named Entity Recognition, and clause chunking. The flexibility of Huntag comes from the fact that it will generate any kind of [...]
Hungarian Webcorpus
With over 1.48 billion words unfiltered (589m words fully filtered), this is by far the largest Hungarian language corpus, and unlike the Hungarian National Corpus (125m words), it is available in its entirety under a permissive Open Content license. The Hungarian webcorpus was crawled in the winter of 2003 as part of the WordSword project [...]
hunpos – HMM part-of-speech tagger
Hunpos is an open source reimplementation of TnT, the well known part-of-speech tagger by Thorsten Brants. the project has moved to Google Code:http://code.google.com/p/hunpos/ Features Free and open source, even for commercial use. For languages with more complex morphologies, HMM tagging could be quite competitive with the current generation of learning algorithms applying e.g. SVM and [...]
hunpars – szintaktikai elemző magyar nyelvre
A Hunpars szintaktikai elemző magyar nyelvre. Bemenetként egy szövegfile-t kap mondatokkal, kimenetként pedig megadja a mondatok szintaktikai fáját egy egyszerű zárójelezéses jelölésben és GrahpViz dot nyelvű file-okban. Szoftver követelmények Python 2.4 vagy újabb Hunmorph morfológiai elemző Graphviz (http://www.graphviz.org) Használat hunmorph morfológiai elemzővel (ajánlott) Ebben az esetben a mondatok szavait a hunmorph segítségével morfológiai elemzésre kerülnek, [...]
hunmorph – morphological analyzer
Hunmorph is an open source tool and programming library for spell-checking, stemming and morphological analysing of agglutinative, german and other languages. Mailing list Our research group has been working on a Hungarian morphological analyzer since 2003. First we extended the codebase of MySpell, a reimplementation of the well-known Ispell spellchecker, yielding a generic word analysis [...]
Hunglish Corpus
Hunglish Corpus Version 2.0 The Hunglish Corpus is a free sentence-aligned Hungarian-English parallel corpus of about 120 million words in 4 million sentence pairs. This is the Version 2.0 release of the Corpus, approximately doubling the size of the original 1.0 release from 2005. Download Search The Corpus can be downloaded from our [...]
morphdb.hu – Hungarian lexical database and morphological grammar
morphdb.hu is an open source morphological database of Hungarian, consisting of a lexicon and morphological grammar that are based on well-founded theoretical decisions. morphdb.hu is described in the formal representation form of hunlex, an offline resource compiler which offers a linguistically motivated morphological description language and allows for principled, flexible maintenance and extension of resources. [...]
hunalign – sentence aligner
The hunalign sentence aligner Introduction hunalign aligns bilingual text on the sentence level. Its input is tokenized and sentence-segmented text in two languages. In the simplest case, its output is a sequence of bilingual sentence pairs (bisentences). In the presence of a dictionary, hunalign uses it, combining this information with Gale-Church sentence-length information. In the [...]
huntoken
huntoken – rule based tokenizer and sentence boundary detector for Hungarian (and English) texts.