Abstract:
A morphological analyzer analyses a word into its lemma and a set of morphosyntactic tags. It is a crucial tool for natural language processing-related tasks especially in morphologically rich languages such as Sinhala. We present SinMorphy, the first comprehensive morphological analyser and synthesizer for the Sinhala language. SinMorphy is a rule-based system with a comprehensive vocabulary of Sinhala words. Therefore, it accurately handles a great majority of contemporary Sinhala text. It also synthesizes the lexical form of a word given a lemma and a set of tags. The system is based on a finite-state transducer and is written in the Foma and Lexc languages. It handles all types of words including nouns, verbs (including compound nouns and verbs), adjectives, adverbs, and particles. It also includes a guesser to analyze out-of-vocabulary words. It correctly analyses 81.3% of the most common 20,000 Sinhala words and 85.2% of a random test set of 1000 words.
Citation:
K. Kumarasinghe, G. Dias and I. Herath, "SinMorphy: A Morphological Analyzer for the Sinhala Language," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 681-686, doi: 10.1109/MERCon52712.2021.9525636.