87
References 
 
[1] Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques, 
Second Edition, Morgan Kaufmann publications, United States of America, 
2006. 
[2] Daniel t. Larose, Data mining methods and models,  John Wiley & Sons, Inc, 
United States of America, 2006. 
[3] Wesley Chu, Tsau Young Lin (Eds.), Foundations and Advances in Data 
Mining, vol 180, Springer-Verlag Berlin Heidelberg, 2005. 
[4] David Taniar, Research and Trends in Data Mining Technologies and 
Applications, Idiea Group publication, 2007. 
[5] David Kirk Evans, Judith L. Klavans, Kathleen R. McKeown, “Columbia 
Newsblaster: Multilingual News Summarization on the Web”, Department of 
Computer Science, Columbia University, NY.  
[6] Andreas Hotho, Alexander Maedche, Steffen Staab, “Ontology-based Text 
Document Clustering”, DCAI 2001.  
[7] Hannes Wettig, Jussi Lahtinen, Tuomas Lepola, Petri Myllym¨aki, Henry Tirri, 
“Bayesian Analysis of Online Newspaper Log Data”, Proceedings of the 2003 
Symposium on Applications and the InternetWorkshops (SAINT 
2003Workshops), IEEE Computer Society, Los Alamitos, California, 2003, Pp. 
282–287. 
[8] Sameh H. Ghwanmeh, “Applying Clustering of Hierarchical K-means-like 
Algorithm on Arabic Language”, International Journal Of Information 
Technology, Volume 3 Number 1 2006 Issn 1305-2403. 
[9] Noam Slonim, Naftali Tishby, “The Power of Word Clusters for Text 
Classification”, 23rd European Colloquium on Information Retrieval 
Research, 2001. 
[10] Y.C. Fang, S. Parthasarathy, F. Schwartz, “Using Clustering to Boost Text 
Classification“, 2001 IEEE International Conference on Data Mining, Ohio 
State University, 2001. 
[11] Peng Dai, Uri Iurgel, Gerhard Rigoll, “A Novel Feature Combination 
Approach for Spoken Document Classification with Support Vector 
Machines”, University of Duisburg-Essen, Duisburg, Germany. 
 88
[12] Bei Yu, John Unsworth , “An Evaluation of Text Classification Methods for 
Literary Study”, University of Illinois at Urbana-Champaign.  
[13] L. Douglas Baker, Andrew Kachites McCallum, “Distributional Clustering of 
Words for Text Classification”, ACM SIGIR 98, 1998. 
[14] Marijus bernotas, Kazys Karklius, Remigijus Laurutis, Asta Slotkienė , “The 
Peculiarities Of The Text Document Representation, Using Ontology And 
Tagging-Based Clustering Technique ”, Information Technology And Control, 
Vol.36, No.2, 2007. 
[15] Gair, James W. Selected and Edited by Barbara C. Lust, Studies in South 
Asian Linguistics: Sinhala and Other South Asian Languages, New York / 
Oxford: Oxford University, 1998. 
[16] Gair, James W., Sinhala - The Indo-Aryan Languages, George Cardona & 
Dhanesh Jain (eds.), London / New York: Routledge (Routledge Language 
Family Series), 2003.  
[17] Aleksander Kotcz, Joshua Alspetor, “SVM-based filtering of e-mail spam with 
content specific misclassification costs”, 2001 IEEE International Conference 
on Data Mining, 2001. 
[18] Bjornar Larsen, Chinatsu Aone, “Fast and effective text mining using linear-
time document clustering”, Proceedings of the fifth ACM SIGKDD 
international conference on Knowledge discovery and data mining, pg: 16 - 
22  , ISBN:1-58113-143-7 , 1999. 
[19] Catherine Blake, Wanda Pratt, “Better rules, fewer features: A semantic 
approach to selecting features from text”, 2001 IEEE International Conference 
on Data Mining. 
[20] Cheng Xiang Zhai, Atulya Velivelli, Bei Yu, “A cross-collection mixture 
model for comparative text mining”, Proceedings of the tenth ACM SIGKDD 
international conference on Knowledge discovery and data mining, pg: 743 – 
748,  2004 , ISBN:1-58113-888-1 . 
[21] Chid Apte, Fred Damerau, Sholom Weiss, “Text Mining with Decision Rules 
and Decision Trees”, IBM Research Division, T.J. Watson Research Center, 
Yorktown Heights, NY 10598. 
[22] Julia Itskevitch, “Automatic hierarchical e-mail classification using 
association rules “, Belorussian State Polytechnic Academy, 1997. 
 
 89
[23] Marko Grobelnik, Dunja Mladenic, Natasa Milic-Frayling, “Text Mining as 
Integration of Several Related Research Areas: Report on KDD’2000 
Workshop on Text Mining”, ACM SIGKDD, 2000. 
[24] Waikato Environment for Knowledge Analysis (WEKA), version 3.4.11, 
University of Waikato, New Zealand.  
[25] Prof. J.B. Dissanyake, “Some Salient Features of the Sinhala Alphabet”.  
[26]  Gihan V. Dias, “Challenges of enabling IT in the Sinhala Language”, 27th 
Internationalization and Unicode Conference, Berlin, Germany, April 2005. 
[27] Gihan Dias, Aruni Goonetilleke, “Development of Standards for Sinhala 
Computing”, 1st Regional Conference on ICT and E-Paradigms, Sri Lanka, 
2004. 
[28]  Samaranayake V. K., Nandasara S. T., Dissanayake J. B., Weerasinghe A.R., 
Wijayawardhana H., An Introduction to UNICODE for Sinhala Characters, 
University of Colombo School of Computing, 2003. 
[29] Gihan Dias, “Using IT in Local Languages”. 
[30] Gihan Dias, “Representation of Sinhala in Unicode”.  
[31] Muthu Nedumaran, “Sinhala Unicode developer workshop”. 
[32] Yuen-Hsien Tseng , “FJU Test Collection for Evaluation of  Chinese Text 
Categorization”, 2004.   
[33] Text mining tool, Magenta Technology text understanding approach, available 
at http://www.magenta-technology.com/en/technology/ 
[34]  Heikki Hyötyniemi, “Text Document Classification with Self-Organizing 
Maps”, Proceedings of STeP'96. Jarmo Alander, Timo Honkela and Matti 
Jakobsson (eds.), Publications of the Finnish Artificial Intelligence Society, pp. 
64-72. 
[35]  Honkela T., Kaski S., Lagus K., Kohonen T., Newsgroup Exploration with 
WEBSOM Method and Browsing Interface, Helsinki University of Technology, 
Report A32, 1996. 
[36]  Krista Lagus, Timo Honkela, Samuel Kaski, Teuvo Kohonen, “WEBSOM - A 
Status Report”, Proceedings of STeP'96. Jarmo Alander, Timo Honkela and 
Matti Jakobsson (eds.), Publications of the Finnish Artificial Intelligence 
Society, pp. 73-78. 
 
 
 90
[37]  Kanoksri Sarinnapakorn, Miroslav Kubat, “Combining Sub classifiers in Text 
Categorization: A DST-Based Solution and A Case Study”, IEEE transactions 
on knowledge and data engineering, vol. 19, no. 12, December 2007. 
[38]  Ji He, Ah-Hwee Tan, Chew-Lim Tan, “A Comparative Study on Chinese Text 
Categorization Methods”, A-H. Tan, P.Yu (Eds), PRICAI 2000 workshop on 
text and web mining, Melbourne, pp. 24-35, August 2000.  
[39] Fotis Lazarinis, Jesus Vilares Ferro, John Tait, “Improving Non-English Web 
Searching“ (iNEWS07), SIGIR 2007 workshop report, ACM SIGIR Forum, Vol. 
41 No. 2,  December 2007.   
[40]  Andrea L. Houston, Kenneth R. Walsh, “Using an AI-Based Tool to 
Categorize Digitized Textual Forms of Organizational Memory”, Proceedings 
of the 29th Ann& Hawaii International Conference on System Sciences 
(HICSS-29) – 1996, 1060-3425/96 IEEE. 
[41] Bled, Slovenia, “Machine Learning in Text Data Analysis”, ICML-99 
Workshop, June 30, 1999.  
[42] “Natural Language Computing”, http://research.microsoft.com/research/pub/ 
China - Natural Language Computing - Home.mht. 
[43]  Shan Chen, Damminda Alahakoon, Maria Indrawan, Background knowledge 
drive ontology discovery, Monash University.  
[44]  Chung-Hong Lee, Hsin-Chang Yang, Sheng-Min Ma , ”A Novel Multilingual 
Text Categorization System using Latent Semantic Indexing“,  First 
International Conference on Innovative Computing, Information and Contro,  
Volume II (ICICIC'06), pp. 503-506. 
[45]  Guo D., Berry M.W., Thompson B.B., Bailin S., “Knowledge Enhanced 
Latent Semantic Indexing”, Information Retrieval, Volume 6, Number 2, April 
2003, pp.225-250(26).  
[46]  Mark Girolami, A. Kaban, “On an Equivalence between PLSI and LDA”, 
Proceedings of SIGIR 2003, 2003.  
[47]  Dean Wright, Integrating Language Identification with Text Classification, 
UMBC, 2004. 
[48]  Daniel Boley, Maria Gini, Kyle Hastings, Bamshad Mobasher, Jerry Moore, 
“A client-side Web agent for document categorization”, Internet Research: 
Electronic Networking Applications and Policy, Volume 8 · Number 5 · 1998 · 
pp. 387–399, © MCB University Press · ISSN 1066-2243.  
 91
[49] Toni Giorgino, “An Introduction to Text Classification“, 2004.  
[50] “Sinhala hodiya“, URL http://www.LANandWAN.com/Sinhala/hoodiya.htm 
[51] N. Jennings, K. Sycara, M. Wooldridge, “A Roadmap of Agent Research and 
Development”, Autonomous Agents and Multi-Agent Systems, 1, 275–306, 
Kluwer Academic Publishers, Boston, 1998. 
[52] Michael Chau, Daniel Zeng, Hsinchun Chen, Michael Huang, David 
Hendriawan, “Design and evaluation of a multi-agent collaborative Web 
mining system“.  
[53] Guilherme Bittencourt, Frederico L. G. Freitas, ”An Ontology-based 
Architecture for Cooperative Information Agents”. 
[54] D.A.Meedeniya, A.S.Perera, “A Comparative Study on Data Representation to 
Categorize Text Documents”, 20th International Conference on Software 
Engineering and Knowledge Engineering (SEKE’08), 2008.  
[55]  The UCI KDD Archive, Available at  
  http://kdd.ics.uci.edu/databases/nsfabs/nsfabs.html   and  
   http://kdd.ics.uci.edu/databases/reuters_transcribed/ reuters_transcribed.html.  
[56]  Helena Ahonen-Myka, ”Processing of Large Document Collections”, 
University of Helsinki.  
[57]  David L., Olson Dursun Delen, Advanced Data Mining Techniques, 
Springer-Verlag Berlin Heidelberg, USA,  2008. ISBN: 978-3-540-76916-3 
[58]  Ian H.Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools 
and Techniques, Second edition, Morgan Kaufmann Publishers, CA, 2005.  
[59]  Pavel Praks, Jiri Dvorsky, Vaclav Snasel, “Latent Semantic Indexing for 
Image Retrieval Systems”. 
[60]  Landauer, T. K., Foltz, P. W., & Laham, D. ,“ Introduction to Latent 
Semantic Analysis, Discourse Processes, 25, 259-284, 1998. 
[61] Yogesh Raja - Shaogang Gong, “Gaussian Mixture Model (GMM)”, Queen 
Mary and Westfield College, England. 
[62] Hichem Sahbi, “A Particular Gaussian Mixture Model for Clustering”, 
Cambridge University, UK. 
[63] Erik Norvell, “Gaussian Mixture Model Based Audio Coding in a Perceptual 
Domain”, 2005.  
[64] Amos Storkey,” Learning from Data: Density Estimation - Gaussian 
Distribution”, School of Informatics.  
 92
[65] Samy Bengio, “Statistical Machine Learning from Data : Gaussian Mixture 
Models”, 2006.   
[66] Kuei-Hsien, “K Means Clustering , Nearest Cluster and Gaussian Mixture”, 
2005. 
[67] “A tutorial on Clustering Algorithms”, available at 
http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/indix.html 
[68] Matlab: the language of technical computing – Help 
[69] Richard J.Roiger, Michael W.Geatz, Data Mining: A Tutorial-based Primer, 
Pearson Educations, 2005.  
[70] D.A.Meedeniya, A.S.Perera, “An Adaptive Technique to Categorize Indic 
Language Documents”, International Conference on Advanced Computing 
Technologies (ICACT '08), India, 2008.  
[71] Thomas K Landauer, Susan Dumais, “Latent semantic analysis”.  
[72] Sinhala text document corpus, publicly available at 
http://vijayaba.cse.mrt.ac.lk/~dulani/