Abstract:
Web logs can provide a wealth of information on
user access patterns of a corresponding website, when they are
properly analyzed. However, finding interesting patterns hidden
in the low-level log data is non-trivial due to large log volumes,
and the distribution of the log files in cluster environments. This
paper presents a novel technique, the application of Density-
Based Spatial Clustering of Applications with Noise (DBSCAN)
and Expectation Maximization (EM) algorithms in an iterative
manner for clustering web user sessions. Each cluster
corresponds to one or more web user activities. The unique user
access pattern of each cluster is identified by frequent pattern
mining and sequential pattern mining techniques. When
compared with the clustering output of EM, DBSCAN, and kmeans
algorithms, this technique shows better accuracy in web
session mining, and it is more effective in identifying cluster
changes with time. We demonstrate that the implemented system
is capable of not only identifying common user behaviors, but
also of identifying cyber-attacks.
Citation:
M. Udantha, S. Ranathunga and G. Dias, "Modelling website user behaviors by combining the EM and DBSCAN algorithms," 2016 Moratuwa Engineering Research Conference (MERCon), 2016, pp. 168-173, doi: 10.1109/MERCon.2016.7480134.