Abstract:
There is an exponential growth in issues attached with lifestyles of Sri Lankans over
the past few decades. These may contribute to low down the life quality within
citizens. In Sri Lanka, there are no adequate researches in the field of analyzing
lifestyle data. Though there are few researches which have analyzed the causes for the
socio-economic problems, such approaches are not capable of handling big data
effectively and not efficient in predicting or describing the issues attach with lifestyle.
Hence, the research has been conducted to analyze citizen profiles in effective way to
explore different lifestyle issues. It is hypothesized that analyzing citizen profiles can
be done through data mining according to the output want to achieve through
predictive or descriptive techniques. The solution takes HIES data set as the input and
predict the factors attach with a particular lifestyle issue or describe specific lifestyle
issue with its associative causes. Having received the input, this approach
preprocessed the dataset to remove the anomalies. Then build data models to
represent the lifestyle issue by extracting attributes from HIES data set. Then proceed
with pattern recognition for the issues. The important patterns recognized through this
approach will be useful for government and policy makers to set up appropriate
government policies to uplift the life quality of citizen. The overall design of the
research consists oftwo research question, one question used predictive mining based
solution and other one is based on descriptive mining. Classification in data mining
used in finding the factors and their relationships that associated with no
schooling and dropouts as those were predictive mining tasks. Clustering is used to
explore the relationship between chronic diseases and family.
was
The overall research is designed using WEKA data mining tool and SPSS statistical
tool. Finally, the data models build for citizen profile analysis using data mining
techniques are evaluated for their performance using measurements such as value for
accuracy, error rate, training time, TP rate, FP rate and ROC measurement.