Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

dc.contributor.authorRathnayake, H
dc.contributor.authorSumanapala, J
dc.contributor.authorRukshani, R
dc.contributor.authorRanathunga, S
dc.date.accessioned2023-06-20T04:29:00Z
dc.date.available2023-06-20T04:29:00Z
dc.date.issued2022
dc.description.abstractCode-mixing and code-switching are frequent features in online conversations. Classification of such text is challenging if one of the languages is low-resourced. Fine-tuning pre-trained multilingual language models is a promising avenue for code-mixed text classification. In this paper, we explore adapter-based fine-tuning of PMLMs for CMCS text classification. We introduce sequential and parallel stacking of adapters, continuous fine-tuning of adapters, and training adapters without freezing the original model as novel techniques with respect to single-task CMCS text classification. We also present a newly annotated dataset for the classification of Sinhala–English code-mixed and code-switched text data, where Sinhala is a low-resourced language. Our dataset of 10000 user comments has been manually annotated for five classification tasks: sentiment analysis, humor detection, hate speech detection, language identification, and aspect identification, thus making it the first publicly available Sinhala–English CMCS dataset with the largest number of task annotation types. In addition to this dataset, we also tested our proposed techniques on Kannada–English and Hindi–English datasets. These experiments confirm that our adapter-based PMLM fine-tuning techniques outperform or are on par with the basic fine-tuning of PMLM models.en_US
dc.identifier.citationRathnayake, H., Sumanapala, J., Rukshani, R., & Ranathunga, S. (2022). Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification. Knowledge and Information Systems, 64(7), 1937–1966. https://doi.org/10.1007/s10115-022-01698-1en_US
dc.identifier.databaseSpringer Linken_US
dc.identifier.doihttps://doi.org/10.1007/s10115-022-01698-1en_US
dc.identifier.issn0219-3116en_US
dc.identifier.issue7en_US
dc.identifier.journalKnowledge and Information Systemsen_US
dc.identifier.pgnos1937-1966en_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/21126
dc.identifier.volume64en_US
dc.identifier.year2022en_US
dc.language.isoen_USen_US
dc.publisherSpringeren_US
dc.subjectCode-switchingen_US
dc.subjectCode-mixingen_US
dc.subjectText classificationen_US
dc.subjectLow-resource languagesen_US
dc.subjectSinhalaen_US
dc.subjectXLM-Ren_US
dc.subjectAdapteren_US
dc.titleAdapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classificationen_US
dc.typeArticle-Full-texten_US

Files