Duplicate detection in multi-domain community question answering

Kariyawasam KKR

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Computer Science & Engineering
→
Master of Science in Computer science and Engineering
→
View Item

dc.contributor.advisor	Ranathunga S
dc.contributor.author	Kariyawasam KKR
dc.date.accessioned	2020
dc.date.available	2020
dc.date.issued	2020
dc.identifier.uri	http://dl.lib.uom.lk/handle/123/16779
dc.description.abstract	Community based question answering forums are very popular these days. People tend to refer community forums for opinions in various fields such as electronics, medical and automobile. It is very easy and useful to find a good opinion freely, but it is hard to choose the correct one when there are thousands of reviews. There have been several efforts to automate the activities of community-based question answering systems, such as the selection of the most relevant answers to the question (question comment similarity), and identifying the questions already posted that are similar to the new question (question-question similarity). However, there are fewer attempts taken to automate the process of duplicate detection in community question answering systems. At the moment, it is the community itself that manually detects duplicates. The automation attempts are more into individual domains. The objective of this research is to implement a mechanism that effectively identifies duplicate questions in a data set consisting of question-answer sets from multiple domains. Solution we propose consists of two focus areas such as classification and retrieval. A neural network composed of two parallel LSTM layers (to represent query and candidate question), attention layer and a gradient reversal layer (based on domain) is proposed as the question pair classifier. It’s trained for individual domains (without gradient reversal) and achieved better accuracy than the latest baseline research for this dataset for 9 out of 12 domains. For retrieval the approach was to retrieve 20 candidates using BM25 and re-rank using classifiers trained already. This selects the duplicate into top 10 with better MAP than BM25 does 6 out of 12 domains. Another important observation is that the common model built with all the data combined gained better MAP than the individual models for 7 domains out of 12 in the retrieval case.	en_US
dc.language.iso	en	en_US
dc.subject	COMPUTER SCIENCE AND ENGINEERING-Dissertations	en_US
dc.subject	COMPUTER SCIENCE-Dissertations	en_US
dc.subject	MULTI-DOMAIN DATA	en_US
dc.subject	SIAMESE NEURAL NETWORKS	en_US
dc.subject	DOMAIN ADAPTATION	en_US
dc.subject	INFORMATION RETRIEVAL-Question Pair Classification	en_US
dc.subject	INFORMATION RETRIEVAL-Duplicate Question Retrieval	en_US
dc.title	Duplicate detection in multi-domain community question answering	en_US
dc.type	Thesis-Abstract	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	MSc in Computer Science	en_US
dc.identifier.department	Department of Computer Science & Engineering	en_US
dc.date.accept	2020
dc.identifier.accno	TH4254	en_US