Show simple item record

dc.contributor.author Sarveswaran, K
dc.contributor.author Dias, G
dc.contributor.author Butt, M
dc.date.accessioned 2023-04-28T08:01:30Z
dc.date.available 2023-04-28T08:01:30Z
dc.date.issued 2021
dc.identifier.citation Sarveswaran, K., Dias, G., & Butt, M. (2021). ThamizhiMorph: A morphological parser for the Tamil language. Machine Translation, 35(1), 37–70. https://doi.org/10.1007/s10590-021-09261-5 en_US
dc.identifier.issn 1573-0573 en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/20996
dc.description.abstract This paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon. en_US
dc.language.iso en_US en_US
dc.subject Morphological analyser en_US
dc.subject Morphological generator en_US
dc.subject Finite-State transducer en_US
dc.subject Tamil language en_US
dc.subject Low-resource language en_US
dc.subject Morphologically rich language en_US
dc.title ThamizhiMorph: A morphological parser for the Tamil language en_US
dc.type Article-Full-text en_US
dc.identifier.year 2021 en_US
dc.identifier.journal Machine Translation en_US
dc.identifier.volume 35 en_US
dc.identifier.database Springer Link en_US
dc.identifier.pgnos 37–70 en_US
dc.identifier.doi 10.1007/s10590-021-09261-5 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record