Bilingual lexical induction for sinhala-english using cross lingual embedding spaces

dc.contributor.authorLiyanage, A
dc.contributor.authorRanathunga, S
dc.contributor.authorJayasena, S
dc.contributor.editorAdhikariwatte, W
dc.contributor.editorRathnayake, M
dc.contributor.editorHemachandra, K
dc.date.accessioned2022-10-19T05:39:24Z
dc.date.available2022-10-19T05:39:24Z
dc.date.issued2021-07
dc.description.abstractBilingual lexicons are an important resource in Natural Language Processing (NLP). Such resources are scarce for Low Resource languages (LRLs) such as Sinhala. However, research on Bilingual Lexical Induction (BLI) on low resource settings is limited. This paper presents the first-ever implementation of BLI for the Sinhala-English language pair. Following the recently introduced VecMap model, we map the vectors of words belonging to both Sinhala and English into a shared vector space and measure the Cross Lingual (CL) similarity between the words. The closest English word for a given Sinhala word in this CL vector space is taken as the corresponding similar word. Currently, there is no detailed evaluation with respect to the size and the nature of the dataset used to create the word vectors, type of the evaluation dictionary, or the technique used to create the word vectors. This paper presents a comprehensive analysis of how these factors affect BLI for Sinhala and English languages and shows that the BLI results have a heavy dependency on these factors.en_US
dc.identifier.citationA. Liyanage, S. Ranathunga and S. Jayasena, "Bilingual Lexical Induction for Sinhala-English using Cross Lingual Embedding Spaces," 2021 Moratuwa Engineering Research Conference (MERCon), 2021, pp. 579-584, doi: 10.1109/MERCon52712.2021.9525667.en_US
dc.identifier.conferenceMoratuwa Engineering Research Conference 2021en_US
dc.identifier.departmentEngineering Research Unit, University of Moratuwaen_US
dc.identifier.doi10.1109/MERCon52712.2021.9525667en_US
dc.identifier.facultyEngineeringen_US
dc.identifier.pgnospp. 579-584en_US
dc.identifier.placeMoratuwa, Sri Lankaen_US
dc.identifier.proceedingProceedings of Moratuwa Engineering Research Conference 2021en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/19131
dc.identifier.year2021en_US
dc.language.isoenen_US
dc.publisherIEEEen_US
dc.relation.urihttps://ieeexplore.ieee.org/document/9525667en_US
dc.subjectSinhalaen_US
dc.subjectEmbedding Modelsen_US
dc.subjectMapped Embedding Spacesen_US
dc.subjectBilingual Lexicon Inductionen_US
dc.titleBilingual lexical induction for sinhala-english using cross lingual embedding spacesen_US
dc.typeConference-Full-texten_US

Files

Collections