Show simple item record

dc.contributor.author Fayaza, MSF
dc.contributor.author Ranathunga, S
dc.contributor.editor Weeraddana, C
dc.contributor.editor Edussooriya, CUS
dc.contributor.editor Abeysooriya, RP
dc.date.accessioned 2022-08-09T09:24:13Z
dc.date.available 2022-08-09T09:24:13Z
dc.date.issued 2020-07
dc.identifier.citation M. S. Faathima Fayaza and S. Ranathunga, "Tamil News Clustering Using Word Embeddings," 2020 Moratuwa Engineering Research Conference (MERCon), 2020, pp. 277-282, doi: 10.1109/MERCon50084.2020.9185282. en_US
dc.identifier.uri http://dl.lib.uom.lk/handle/123/18580
dc.description.abstract News aggregators support the readers to view news from multiple news providers via a single point. At the moment, the only news aggregator that supports Tamil news is Google news, which has some noticeable shortages. In this study, Term Frequency–Inverse Document Frequency and word embedding (fastText) document representation techniques were experimented with one pass and affinity propagation clustering algorithms to news title, as well as title and body in order to implement a news aggregator for the Tamil language. For this study we collected data from nine different news providers. When fastText was applied with one pass algorithm to news title and body, it managed to beat other approaches to achieve an average pairwise F-score of 81% with respect to manual clustering. Also, we were able to create a Tamil fastText word embedding model using more than 21 million words. en_US
dc.language.iso en en_US
dc.publisher IEEE en_US
dc.relation.uri https://ieeexplore.ieee.org/document/9185282 en_US
dc.subject document clustering en_US
dc.subject Tamil en_US
dc.subject word embedding en_US
dc.subject Term Frequency–Inverse Document Frequency en_US
dc.subject affinity propagation clustering en_US
dc.subject one pass algorithm en_US
dc.title Tamil news clustering using word embeddings en_US
dc.type Conference-Full-text en_US
dc.identifier.faculty Engineering en_US
dc.identifier.department Engineering Research Unit, University of Moratuwa en_US
dc.identifier.year 2020 en_US
dc.identifier.conference Moratuwa Engineering Research Conference 2020 en_US
dc.identifier.place Moratuwa, Sri Lanka en_US
dc.identifier.pgnos pp. 277-282 en_US
dc.identifier.proceeding Proceedings of Moratuwa Engineering Research Conference 2020 en_US
dc.identifier.email msf.fayaza89@gmail.com en_US
dc.identifier.email surangika@cse.mrt.ac.lk en_US
dc.identifier.doi 10.1109/MERCon50084.2020.9185282 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record