Abstract:
This paper presents a domain specific Tamil Named Entity Recognizer for history domain. The system uses a manually annotated corpus of 23k tokens and the dataset is tagged with 36 tags related to history domain. NER model is trained for Tamil based on Conditional Random Fields (CRF) with the use of features extracted based on the domain of interest and language. Hyper parameter tuning is applied with random search algorithm to find the best hyper parameters for the model. Tamil is a low resourced and morphologically rich language which makes the task challenging. Despite that, the system achieved a fair results with micro-averaged Precision, Recall and Fl-score of 87.9%, 67.1% and 76.1% respectively.
Citation:
R. Murugathas and U. Thayasivam, "Domain specific Named Entity Recognition in Tamil," 2022 Moratuwa Engineering Research Conference (MERCon), 2022, pp. 1-6, doi: 10.1109/MERCon55799.2022.9906295.