Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

Rathnayake, H; Sumanapala, J; Rukshani, R; Ranathunga, S

Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

dc.contributor.author	Rathnayake, H
dc.contributor.author	Sumanapala, J
dc.contributor.author	Rukshani, R
dc.contributor.author	Ranathunga, S
dc.date.accessioned	2023-06-20T04:29:00Z
dc.date.available	2023-06-20T04:29:00Z
dc.date.issued	2022
dc.description.abstract	Code-mixing and code-switching are frequent features in online conversations. Classification of such text is challenging if one of the languages is low-resourced. Fine-tuning pre-trained multilingual language models is a promising avenue for code-mixed text classification. In this paper, we explore adapter-based fine-tuning of PMLMs for CMCS text classification. We introduce sequential and parallel stacking of adapters, continuous fine-tuning of adapters, and training adapters without freezing the original model as novel techniques with respect to single-task CMCS text classification. We also present a newly annotated dataset for the classification of Sinhala–English code-mixed and code-switched text data, where Sinhala is a low-resourced language. Our dataset of 10000 user comments has been manually annotated for five classification tasks: sentiment analysis, humor detection, hate speech detection, language identification, and aspect identification, thus making it the first publicly available Sinhala–English CMCS dataset with the largest number of task annotation types. In addition to this dataset, we also tested our proposed techniques on Kannada–English and Hindi–English datasets. These experiments confirm that our adapter-based PMLM fine-tuning techniques outperform or are on par with the basic fine-tuning of PMLM models.	en_US
dc.identifier.citation	Rathnayake, H., Sumanapala, J., Rukshani, R., & Ranathunga, S. (2022). Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification. Knowledge and Information Systems, 64(7), 1937–1966. https://doi.org/10.1007/s10115-022-01698-1	en_US
dc.identifier.database	Springer Link	en_US
dc.identifier.doi	https://doi.org/10.1007/s10115-022-01698-1	en_US
dc.identifier.issn	0219-3116	en_US
dc.identifier.issue	7	en_US
dc.identifier.journal	Knowledge and Information Systems	en_US
dc.identifier.pgnos	1937-1966	en_US
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/21126
dc.identifier.volume	64	en_US
dc.identifier.year	2022	en_US
dc.language.iso	en_US	en_US
dc.publisher	Springer	en_US
dc.subject	Code-switching	en_US
dc.subject	Code-mixing	en_US
dc.subject	Text classification	en_US
dc.subject	Low-resource languages	en_US
dc.subject	Sinhala	en_US
dc.subject	XLM-R	en_US
dc.subject	Adapter	en_US
dc.title	Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification	en_US
dc.type	Article-Full-text	en_US

Collections

Articles authored by UoM staff (Publish in scimago's Q1 journals)

Adapter-based fine-tuning of pre-trained multilingual language models for code-mixed and code-switched text classification

Files

Collections