Abstract:
The Data mining is the area which helps to uncover hidden patterns and identify correlations from massive amount ofstructured and unstructured data. With the advent of improved and modified prediction techniques in data mining, there is a need for an analyst to know which tool performs best for a particular type of data set. Hence, selecting the best technique among many techniques at the correct time is very much important and that will save enormous amount of valuable time ofdecision makers. This research has been conducted to construct a model, which can be used to measure the predictive accuracy of the credit risk of leasing customers in Sri Lanka and to compare different data mining techniques in the finance domain for the purpose of selecting the best and adequate technique. It is hypothesized that, using Logistic Regression, Naive Bayes algorithm, Decision Tree-J48 and Neural Networks , credit risk prediction can be addressed in the leasing industry ofSri Lanka. The dataset employed in this study was obtained from one ofthe leading finance/leasing companies in Sri Lanka. All the agreements, which were matured on December 2015 were considered for the study under 24 variables. Altogether 8235 customers/ data instances have been considered for the analysis. The variable refining process conducted using the Statistical Package for Social Sciences software. Since the dependent variable is categorical and dichotomous, backward elimination method in the logistic regression was employed. There were nine independent variables and dependent variable have been selected from the refining process. The data set divided in to two different datasets, training (60%) and test (40%) data sets. The Waikato Environment for Knowledge Analysis machine learning software was the major software tool used for the entire model construction process. Four (4) main data mining techniques (Logistic Regression, Naive Bayes, Decision Tree -J 48 and Neural Networks) were used to construct models and results from each model were obtained and compared with other techniques. According to the results of the study, we can conclude that, with healthy classification accuracy, kappa statistic. Area underthe curve (AUC) value and F-Measure, a model constructed using the neural network as the best model to predict the payment accuracy ofleasing customers in Sri Lanka.