dc.description.abstract |
The Data mining is the area which helps to uncover hidden patterns and identify
correlations from massive amount ofstructured and unstructured data. With the advent
of improved and modified prediction techniques in data mining, there is a need for an
analyst to know which tool performs best for a particular type of data set. Hence,
selecting the best technique among many techniques at the correct time is very much
important and that will save enormous amount of valuable time ofdecision makers.
This research has been conducted to construct a model, which can be used to measure
the predictive accuracy of the credit risk of leasing customers in Sri Lanka and to
compare different data mining techniques in the finance domain for the purpose of
selecting the best and adequate technique. It is hypothesized that, using Logistic
Regression, Naive Bayes algorithm, Decision Tree-J48 and Neural Networks , credit
risk prediction can be addressed in the leasing industry ofSri Lanka.
The dataset employed in this study was obtained from one ofthe leading finance/leasing
companies in Sri Lanka. All the agreements, which were matured on December 2015
were considered for the study under 24 variables. Altogether 8235 customers/ data
instances have been considered for the analysis. The variable refining process
conducted using the Statistical Package for Social Sciences software. Since the
dependent variable is categorical and dichotomous, backward elimination method in
the logistic regression was employed. There were nine independent variables and
dependent variable have been selected from the refining process. The data set
divided in to two different datasets, training (60%) and test (40%) data sets. The
Waikato Environment for Knowledge Analysis machine learning software was the
major software tool used for the entire model construction process. Four (4) main data
mining techniques (Logistic Regression, Naive Bayes, Decision Tree -J 48 and Neural
Networks) were used to construct models and results from each model were obtained
and compared with other techniques. According to the results of the study, we can
conclude that, with healthy classification accuracy, kappa statistic. Area underthe curve
(AUC) value and F-Measure, a model constructed using the neural network as the best
model to predict the payment accuracy ofleasing customers in Sri Lanka. |
en_US |