Institutional-Repository, University of Moratuwa.  

Comprehensive Part-Of-Speech Tag Set and SVM Based POS Tagger for Sinhala

Show simple item record

dc.contributor.author Fernando, S
dc.contributor.author Ranathunga, S
dc.contributor.author Jayasena, S
dc.contributor.author Dias, G
dc.date.accessioned 2017-01-16T04:02:13Z
dc.date.available 2017-01-16T04:02:13Z
dc.identifier.uri http://dl.lib.mrt.ac.lk/handle/123/12227
dc.description.abstract This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language. The currently available tag set for Sinhala has two limitations: the unavailability of tags to represent some word classes and the lack of tags to capture inflection based grammatical variations of words. The new tag set, presented in this paper overcomes both of these limitations. The accuracy of available Sinhala Part-Of-Speech taggers, which are based on Hidden Markov Models, still falls far behind state of the art. Our Support Vector Machine based tagger achieved an overall accuracy of 84.68% with 59.86% accuracy for unknown words and 87.12% for known words, when the test set contains 10% of unknown words. en_US
dc.relation.uri http://www.aclweb.org/anthology/W/W16/W16-37.pdf#page=185 en_US
dc.source.uri http://www.aclweb.org/anthology/W/W16/W16-37.pdf#page=185 en_US
dc.title Comprehensive Part-Of-Speech Tag Set and SVM Based POS Tagger for Sinhala en_US
dc.identifier.year 2016 en_US
dc.identifier.journal WSSANLP 2016 en_US
dc.identifier.pgnos 173 en_US
dc.identifier.email gihan@uom.lk en_US
dc.identifier.email sanath@cse.mrt.ac.lk en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record