Comprehensive Part-Of-Speech Tag Set and SVM Based POS Tagger for Sinhala

Fernando, S; Ranathunga, S; Jayasena, S; Dias, G

Comprehensive Part-Of-Speech Tag Set and SVM Based POS Tagger for Sinhala

Authors

Abstract

This paper presents a new comprehensive multi-level Part-Of-Speech tag set and a Support Vector Machine based Part-Of-Speech tagger for the Sinhala language. The currently available tag set for Sinhala has two limitations: the unavailability of tags to represent some word classes and the lack of tags to capture inflection based grammatical variations of words. The new tag set, presented in this paper overcomes both of these limitations. The accuracy of available Sinhala Part-Of-Speech taggers, which are based on Hidden Markov Models, still falls far behind state of the art. Our Support Vector Machine based tagger achieved an overall accuracy of 84.68% with 59.86% accuracy for unknown words and 87.12% for known words, when the test set contains 10% of unknown words.

URI

http://dl.lib.mrt.ac.lk/handle/123/12227

Collections

Articles authored by UoM staff

Full item page

Comprehensive Part-Of-Speech Tag Set and SVM Based POS Tagger for Sinhala

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By