A Statistical comparison between genetic algorithm and logistic regression for a clinical study

dc.contributor.advisorDaundasekera WB
dc.contributor.advisorEdirisinghe PM
dc.contributor.authorAththanayake AMSMCM
dc.date.accept2020
dc.date.accessioned2020
dc.date.available2020
dc.date.issued2020
dc.description.abstractIdentifying a combination of variables causing infections or infectious diseases is one of the main tasks in clinical models in medicine. Forward and backward variable selection techniques in Logistic Regression (LR) are widely used in such situations, where it assumes linearity of independent variables and the absence of multi-collinearity. More often, the observed data do not satisfy these assumptions and thus, LR is not applicable. Hence, the Genetic Algorithm (GA), which does not depend on pre-defined assumptions, has proven to be better under such circumstances. By evaluating prediction rates of LR and GA techniques, the objective of this study was to perform binary LR and GA to reduce the number of variables on a sample of clinical data and compare the goodness of fit statistics to identify the better variable reduction method. Three models were built using 40 independent variables (3 non-categorical and 37 categorical) for a sample of 497 observations collected from suspected respiratory syncytial virus (RSV) infected children under 5 years of age, who were hospitalized to the Kegalle Base Hospital from May 2016 to July 2018. The binary dependent variable indicates whether the suspected child is infected with RSV positive or negative. Log-likelihood and Area Under Curve (AUC) represent the fitness functions of two GAs. The goodness of fits on the three models was compared using statistical measurements: -2log-likelihood, Psudo R-square values, Correctly Classified Percentage, Specificity, and Sensitivity. Results shown that Log-likelihood GA produces better goodness of fit measurements compared to other the two methods. However, LR reduces 40 variables into 8 with lower number of iterations while two GAs reduced into 17 variables to predict the status of RSV infection. This study suggests that the LR has a better prediction power with the most associated combination of variables. However, GA indicated better in analysing when the predefined assumptions were not satisfied and solving high dimensional classification problems in a large or complex searching space in the background of the study.en_US
dc.identifier.accnoTH4487en_US
dc.identifier.degreeMSc in Business Statisticsen_US
dc.identifier.departmentDepartment of Mathematicsen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/16903
dc.language.isoenen_US
dc.subjectMATHEMATICS- Dissertationsen_US
dc.subjectBUSINESS STATISTICS – Dissertationsen_US
dc.subjectCLINICAL DATAen_US
dc.subjectFITNESS FUNCTIONen_US
dc.subjectGENETIC ALGORITHMen_US
dc.subjectLOGISTIC REGRESSIONen_US
dc.subjectRESPIRATORY SYNCYTIAL VIRUSen_US
dc.titleA Statistical comparison between genetic algorithm and logistic regression for a clinical studyen_US
dc.typeThesis-Full-texten_US

Files