ThamizhiMorph: A morphological parser for the Tamil language

dc.contributor.authorSarveswaran, K
dc.contributor.authorDias, G
dc.contributor.authorButt, M
dc.date.accessioned2023-04-28T08:01:30Z
dc.date.available2023-04-28T08:01:30Z
dc.date.issued2021
dc.description.abstractThis paper presents an open source and extendable Morphological Analyser cum Generator (MAG) for Tamil named ThamizhiMorph. Tamil is a low-resource language in terms of NLP processing tools and applications. In addition, most of the available tools are neither open nor extendable. A morphological analyser is a key resource for the storage and retrieval of morphophonological and morphosyntactic information, especially for morphologically rich languages, and is also useful for developing applications within Machine Translation. This paper describes how ThamizhiMorph is designed using a Finite-State Transducer (FST) and implemented using Foma. We discuss our design decisions based on the peculiarities of Tamil and its nominal and verbal paradigms. We specify a high-level meta-language to efficiently characterise the language’s inflectional morphology. We evaluate ThamizhiMorph using text from a Tamil textbook and the Tamil Universal Dependency treebank version 2.5. The evaluation and error analysis attest a very high performance level, with the identified errors being mostly due to out-of-vocabulary items, which are easily fixable. In order to foster further development, we have made our scripts, the FST models, lexicons, Meta-Morphological rules, lists of generated verbs and nouns, and test data sets freely available for others to use and extend upon.en_US
dc.identifier.citationSarveswaran, K., Dias, G., & Butt, M. (2021). ThamizhiMorph: A morphological parser for the Tamil language. Machine Translation, 35(1), 37–70. https://doi.org/10.1007/s10590-021-09261-5en_US
dc.identifier.databaseSpringer Linken_US
dc.identifier.doi10.1007/s10590-021-09261-5en_US
dc.identifier.issn1573-0573en_US
dc.identifier.journalMachine Translationen_US
dc.identifier.pgnos37–70en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/20996
dc.identifier.volume35en_US
dc.identifier.year2021en_US
dc.language.isoen_USen_US
dc.subjectMorphological analyseren_US
dc.subjectMorphological generatoren_US
dc.subjectFinite-State transduceren_US
dc.subjectTamil languageen_US
dc.subjectLow-resource languageen_US
dc.subjectMorphologically rich languageen_US
dc.titleThamizhiMorph: A morphological parser for the Tamil languageen_US
dc.typeArticle-Full-texten_US

Files