Duplicate bug report detection using pre - trained language models

dc.contributor.advisorRanathunga S
dc.contributor.authorSewwandi KAU
dc.date.accept2022
dc.date.accessioned2022
dc.date.available2022
dc.date.issued2022
dc.description.abstractSoftware testing and defect reporting are significant factors of software development and maintenance. Defects are identified and reported in a bug tracking system like JIRA, or Bugzilla. Those reported defects are further triaged by an expert who has an understanding of the repository, system, and developers and assigns them to the developers to fix them. During this defect reporting there can be duplicate bugs reported and identifying duplicate bugs is a crucial task. Manual labeling of duplicate defects is time-consuming, may identify defects as duplicate bug reports, and also increases the cost of software maintenance. Therefore automated duplicate bug report detection is very significant. This research proposes a duplicate bug report classification methodology that leverages the Pre-trained language models BERT and XLNet with Multi-Layer Perceptron as the Deep Learning classifier for duplicate bug detection. We tested on publicly available datasets related to Eclipse, NetBeans, and OpenOffice bug reporting datasets. The selected models were shown to outperform the previously proposed systems for the same task. Among them, the approach used with BERT embeddings has shown the best results. Further experiments showed that BERT is capable of domain adaptation –meaning that even when the BERT was finetuned with different bug report datasets, it is still capable of detecting duplicate bugs in an unseen dataset. Finally, a multi-stage classification was done using a Convolutional Neural Network model and a BERT model using Eclipse and NetBeans datasets and a combined dataset of Eclipse and NetBeans. The approach used with the combined dataset has outperformed the baseline approach.en_US
dc.identifier.accnoTH4977en_US
dc.identifier.citationSewwandi, K.A.U. (2022). Duplicate bug report detection using pre - trained language models [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21592
dc.identifier.degreeMSc In Computer Science and Engineeringen_US
dc.identifier.departmentDepartment of Computer Science and Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/21592
dc.language.isoenen_US
dc.subjectDUPLICATE BUG DETECTIONen_US
dc.subjectBERTen_US
dc.subjectXLNETen_US
dc.subjectMLPen_US
dc.subjectCNNen_US
dc.subjectDOMAIN ADAPTATIONen_US
dc.subjectMULTI-STAGE CLASSIFICATIONen_US
dc.subjectCOMPUTER SCIENCE & ENGINEERING -Dissertationen_US
dc.subjectCOMPUTER SCIENCE -Dissertationen_US
dc.subjectINFORMATION TECHNOLOGY -Dissertationen_US
dc.titleDuplicate bug report detection using pre - trained language modelsen_US
dc.typeThesis-Abstracten_US

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH4977-1.pdf
Size:
143.66 KB
Format:
Adobe Portable Document Format
Description:
Pre-Text
Loading...
Thumbnail Image
Name:
TH4977-2.pdf
Size:
135.41 KB
Format:
Adobe Portable Document Format
Description:
Post-Text
Loading...
Thumbnail Image
Name:
TH4977.pdf
Size:
927.17 KB
Format:
Adobe Portable Document Format
Description:
Full-theses