Institutional-Repository, University of Moratuwa.  

Investigating the applicability of partition-based clustering for Sinhala documents

Show simple item record

dc.contributor.advisor Perera, AS
dc.contributor.author Meedeniya, DA
dc.date.accessioned 2011-03-29T11:15:27Z
dc.date.available 2011-03-29T11:15:27Z
dc.identifier.citation Meedeniya, D.A. (2008). Investigating the applicability of partition-based clustering for Sinhala documents [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.mrt.ac.lk/handle/123/638
dc.identifier.uri http://dl.lib.mrt.ac.lk/handle/123/638
dc.description A Dissertation submitted to the Department of Computer Science and Engineering for the MSc in Computer Science specializing in Software Architecture ; CD ROM included en_US
dc.description.abstract The significant growth in the electronic media to store and exchange text documents has led to the use of tools, which analyze and categorize documents based on their content. The availability of Thesis-Full-text documents in electronic form emphasizes the need for intelligent information retrieval techniques. In Sri Lanka most of the public services use text documents written in the Sinhala language to provide their services. /As a result, there is a need for systems that can be used to semi-automatically analyze and process documents in Sinhala. Wide availability of electronic data has led to the vast interest in text analysis, information retrieval and text categorization methods. There are many concepts, approaches and techniques associated with text mining. Most of the widely available text categorization tools work only with English text. Therefore to provide a better service, there is a need for non-English based document analysis and categorizing systems, as is currently available for English text documents./ A tool that can automatically categorize a collection of Sinhala documents can be an asset to any service provider that deals with a large number of text documents in Sinhala. Data clustering can be used to categorize documents based on the content. The effectiveness of clustering depends on the feature extraction. The main techniques examined in this study include data pre-processing, feature extraction, and document clustering. The approach makes use of a transformation based on the text frequency and the inverse document frequency, which enhances the clustering performance. This approach is based on Latent Semantic Analysis. A text corpus categorized by human readers is utilized to test the validity of the suggested approach. The technique introduced in this work enables the processing of text documents written in Sinhala, and empowers citizens and organizations to do their daily work efficiently.
dc.format.extent x, 95p. : charts, graphs, tables en_US
dc.language.iso en en_US
dc.subject COMPUTER SCIENCE AND ENGINEERING - Dissertation
dc.subject COMPUTER SCIENCE - Dissertation
dc.subject SINHALA LANGUAGE - Texts
dc.subject SINHALA LANGUAGE - Information Retrieval
dc.title Investigating the applicability of partition-based clustering for Sinhala documents
dc.type Thesis-Abstract
dc.identifier.faculty Engineering en_US
dc.identifier.degree MSc en_US
dc.identifier.department Department of Computer Science and Engineering en_US
dc.date.accept 2008-12
dc.identifier.accno 93369 en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record