Pre-trained language model - based semi - supervised learning approach for content - based email categorization
dc.contributor.advisor | Silva A T P | |
dc.contributor.author | Kankanamge ND | |
dc.date.accept | 2020 | |
dc.date.accessioned | 2020 | |
dc.date.available | 2020 | |
dc.date.issued | 2020 | |
dc.description.abstract | Recent developments in neuroscience have revolutionized modern trends in artificial intelligence. Artificial neural networks (ANN), which is the artificial model of the human brain, have started to dominate in the field of artificial intelligence. The major usage of ANN is for data classification and prediction. There are numerous applications of ANN, ranging from health, education, entertainment, and business. Email classification has been an issue for many of the large organizations as it needs human interaction. There are many artificial intelligence-based solutions have been proposed. When it comes to content-based email filtering, many recent researchers have identified that the use of ANN-based approaches are much more useful than conventional natural language modelling methods, as the volume of data increased. One reason for this is ANN has been able to capture some of the hidden styles of writing which have not been captured by conventional natural language processing. However conventional ANN has been suffering from lack of labeled data for training. This has been the major drawback of conventional ANN approach as generating labeled data needs human interaction and therefore making it a costly process. This has limited ANN solutions from providing a generic approach for email classification in any domain since to succeed, it needs large a number of labeled data from each of these domains to train the particular ANN. This thesis report on our research on content-based email classification using semi-supervised learning which will address the issues with conventional ANN. Semi-supervised learning was introduced around 15 years back but came to play an important role in the field of artificial intelligence recently. Semi-supervised learning provides a solution to this issue as it needs a minimum amount of labeled data for training and it can use unlabeled data to increase its’ accuracy. Proposed solution is a multi-view core-training approach that takes labeled emails, unlabeled emails and the names of the different categories as inputs. Output of the project is a trained model that can classify emails to given categories. We have tested our solution with 10000 training samples where only 10% to 20% were given to the system as labeled data and others were used as unlabeled data. We managed to achieve around 0.888 accuracy which is more than 5% accuracy improvement from the total system. | en_US |
dc.identifier.accno | TH5003 | en_US |
dc.identifier.citation | Kankanamge, N.D. (2020). Pre-trained language model - based semi - supervised learning approach for content - based email categorization [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/21466 | |
dc.identifier.degree | MSc. in Artificial Intelligence | en_US |
dc.identifier.department | Department of Computatio9nal Mathematics | en_US |
dc.identifier.faculty | IT | en_US |
dc.identifier.uri | http://dl.lib.uom.lk/handle/123/21466 | |
dc.language.iso | en | en_US |
dc.subject | EMAIL CATEGORIZATION SYSTEM | en_US |
dc.subject | EMAIL CATEGORIZATION | en_US |
dc.subject | SEMI-SUPERVISED LEARNING-BASED SOLUTION | en_US |
dc.subject | INFORMATION TECHNOLOGY -Dissertation | en_US |
dc.subject | ARTIFICIAL INTELLIGENCE -Dissertation | en_US |
dc.subject | COMPUTATIONAL MATHEMATICS -Dissertation | en_US |
dc.title | Pre-trained language model - based semi - supervised learning approach for content - based email categorization | en_US |
dc.type | Thesis-Abstract | en_US |
Files
Original bundle
1 - 3 of 3
Loading...
- Name:
- TH5003-1.pdf
- Size:
- 264.75 KB
- Format:
- Adobe Portable Document Format
- Description:
- Pre-text
Loading...
- Name:
- TH5003-2.pdf
- Size:
- 146.64 KB
- Format:
- Adobe Portable Document Format
- Description:
- Post-text
Loading...
- Name:
- TH5003.pdf
- Size:
- 865.53 KB
- Format:
- Adobe Portable Document Format
- Description:
- Full-theses
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: