An Optimized supervised learning model for predicting survival rate

dc.contributor.advisorPerera, I
dc.contributor.authorSaumya, TMD
dc.date.accept2025
dc.date.accessioned2026-02-12T08:08:33Z
dc.date.issued2025
dc.description.abstractThe incorporation of machine-learning techniques in medical informatics has facilitated major improvements in cancer survivability prognosis. In this study, the concentration is given on improving the accuracy for the prediction of breast cancer survivability and to come up with a prediction model with the METABRIC dataset, which have both clinical and genomic features. Further improvement of the prediction model through feature engineering and missing data handling was another main objective of this study. The study was started with several ML algorithms such as Logistic Regression, Support Vector Classification, Random Forest, Categorical Boosting and Extreme Gradient Boosting with the objective of selecting the best algorithm for this dataset and to further improve the prediction accuracy with algorithm customizations. Of these, algorithms Extreme Gradient Boosting algorithm provided the best baseline accuracy (81.10%), and then further improvements were done to prediction model pipeline to improve prediction accuracy through kernel- based multiple imputation for missing values in the dataset and also with advanced feature engineering techniques like log transformation , interaction features and categorical feature encoding on the METABRIC dataset which resulted in increase of prediction accuracy up to 87.5% .With the objective of further improvement of prediction accuracy , the Extreme Gradient Boosting Algorithm was customized with a new objective function composite with Asymmetric Cost Sensitivity and Smoothed Focal Loss which resulted in further prediction accuracy improvement of 1.5%.The final proposed model pipeline with customized Extreme Gradient Boosting algorithm offers highly accurate and clinically aligned survivability prediction model which can be used as the base for disease prognosis using transfer learning
dc.identifier.accnoTH6017
dc.identifier.citationSaumya, T.M.D. (2025). An Optimized supervised learning model for predicting survival rate [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/24853
dc.identifier.degreeMSc in Computer Science
dc.identifier.departmentDepartment of Computer Science & Engineering
dc.identifier.facultyEngineering
dc.identifier.urihttps://dl.lib.uom.lk/handle/123/24853
dc.language.isoen
dc.subjectNON-COMMUNICABLE DISEASES-Breast Cancer-Survivability Prediction
dc.subjectMACHINE LEARNING
dc.subjectSUPERVISED LEARNING
dc.subjectEXTREME GRADIENT BOOSTING
dc.subjectMETABRIC DATASET
dc.subjectHEALTH INFORMATICS
dc.subjectCOMPUTER SCIENCE-Dissertation
dc.subjectCOMPUTER SCIENCE AND ENGINEERING-Dissertation
dc.subjectMSc in Computer Science
dc.titleAn Optimized supervised learning model for predicting survival rate
dc.typeThesis-Abstract

Files

Original bundle

Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
TH6017-1.pdf
Size:
758.03 KB
Format:
Adobe Portable Document Format
Description:
Pre-text
Loading...
Thumbnail Image
Name:
TH6017-2.pdf
Size:
182.57 KB
Format:
Adobe Portable Document Format
Description:
Post-text
Loading...
Thumbnail Image
Name:
TH6017.pdf
Size:
2.44 MB
Format:
Adobe Portable Document Format
Description:
Full-thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: