SinhaLearn: NLP, CNN, and OCR based data driven approach for enhancing Sinhala proficiency of grade 5 scholarship students

Abstract

The Sinhala language, deeply rooted in Pali and Sanskrit, holds immense cultural significance for the Sinhalese community in Sri Lanka. However, its intricate morphology and diglossic nature, where written and spoken forms diverge, pose significant challenges. In response, we present "SinhaLearn" an automated system designed specifically to enhance Sinhala language proficiency for the Grade 5 Scholarship examination domain. Our system leverages advanced Natural Language Processing algorithms, Optical Character Recognition technology, Convolutional Neural Networks, and Part-of-Speech tagging. Key functionalities include Sinhala Handwriting Capturing and Recognition, real-time error detection and correction in spelling and grammar within Sinhala Present Tense sentences, simplified explanations for complex Sinhala vocabulary aligned with the Grade 5 curriculum, and automated assessment of handwritten responses with instant feedback and scoring. Notably, informed by a comprehensive literature review, our research has made significant improvements by blending rule-based methods with hybrid components, effectively addressing critical gaps in the field. While existing systems often concentrate on singular functionalities such as spelling analysis, grammar analysis, dictionary functions, or content display, our "SinhaLearn" system provides a holistic solution by integrating these aspects. This comprehensive approach sets a new benchmark for automated systems in Sinhala language proficiency.

Description

Citation

DOI

Collections

Endorsement

Review

Supplemented By

Referenced By