Modelling website user behavior from web access logs

Udantha, GPDM

UoM IR
→
Thesis & Dissertation
→
Faculty of Engineering, Civil Engineering
→
Master of Science By Research
→
View Item

dc.contributor.advisor	Ranathunga, S
dc.contributor.advisor	Dias, G
dc.contributor.author	Udantha, GPDM
dc.date.accessioned	2017-05-22T08:55:04Z
dc.date.available	2017-05-22T08:55:04Z
dc.identifier.uri	http://dl.lib.mrt.ac.lk/handle/123/12748
dc.description.abstract	Mining web access log data is a popular technique to identify frequent access patterns of website users. Web logs can provide a wealth of information on the user access patterns of the corresponding website, if and when they are properly analyzed. However, finding interesting patterns hidden in the low-level log data is non-trivial due to large log volumes, and the distribution of the log files in cluster environments. Existing clustering techniques have not focused on identifying infrequent patterns and most of the clustering techniques suffer from cluster parameter issues, when it comes to web usage mining. This thesis presents the application of Density Based Spatial Clustering of Applications with Noise (DBSCAN) and Expectation Maximization (EM) algorithms in an iterative manner for clustering, which is not a technique that has been used before. Each cluster corresponds to one or more web user activities. For clusters that did not have a unique access pattern, frequent pattern mining and sequence pattern mining techniques were used to identify the unique user access patterns. Secondly, this thesis solves another issue in web usage mining – detecting slight changes between web user access sessions. This thesis introduces a method to identify these access patterns at a much lower level than what is provided by traditional clustering techniques, such as nearest neighbor based techniques and classification techniques. This technique makes use of the concept of episodes to represent web sessions. These episodes are expressed in the form of regular expressions. To the best of our knowledge, this is the first time that the concept of regular expressions are applied to identify user access patterns in web server log data. We demonstrate that the implemented system is capable of not only identifying common user behaviors, but also in identify anomalous user behavior.	en_US
dc.language.iso	en	en_US
dc.subject	Computer Science and Engineering
dc.subject	Computer Science
dc.subject	Website User Behavior
dc.title	Modelling website user behavior from web access logs	en_US
dc.type	Thesis-Full-text	en_US
dc.identifier.faculty	Engineering	en_US
dc.identifier.degree	MSc (Major Component Research)	en_US
dc.identifier.department	Department of Computer Science & Engineering	en_US
dc.date.accept	2016
dc.identifier.accno	TH3196	en_US