Building explanatory models for road crash analysis using data science and machine learning technologies

dc.contributor.advisorPerera L
dc.contributor.authorDe Silva HWIU
dc.date.accept2022
dc.date.accessioned2022
dc.date.available2022
dc.date.issued2022
dc.description.abstractOver three thousand people die annually on the roads of Sri Lanka due to traffic crashes. This is a massive socio and economic problem faced by the country. Road crashes globally cause more than 1.3 million fatalities every year and are the eighth leading cause of death worldwide. Traditionally, road traffic crash analysis and accident modeling resorted to regression models and discrete choice models based on past data. Many countermeasures have been identified and implemented addressing the issues highlighted through such models. Since road traffic crashes occur across space and time, the conventional numerical approaches have failed to provide alerts and insights in relation to geospatial regions. Also, having to handcraft these models limits the explainability that can be leveraged with the help of advanced tools and techniques available in modern data science and machine learning disciplines. Further, the disjointed efforts in building analytical models or geospatial models on available crash data (e.g., crash hotspot identification) limit road agencies’ abilities in prioritizing funds allocation for more impactful improvements. Due to the difficulty in identifying patterns in causal factors of accident risks using conventional or isolated methods, the authorities also find it difficult to prioritize their staff strength in high-risk areas. The combination of exploratory data analysis (EDA), machine learning models, and modern geospatial visualization tools offer a unique opportunity to fill these gaps cost-effectively. This study presents an application of the latest data science and machine learning technologies to build explanatory models that help analyze road crashes. Popular packages written in Python and Javascript programming languages were used. Pandas and SweetViz libraries provided simple, yet powerful EDA. GeoPandas library provided the ability to process GPS locations (latitude and longitude) while Matplotlib was used to generate static maps. Folium library and the underlying Leaflet.js library were applied to generate interactive maps to help visualize crash hot spots. Two leading gradient boosting techniques, namely LightGBM and Catboost were applied to build models that highlight causal factors via feature importance estimation methods. The study developed algorithms, methods, and charts to generate attribute correlation and gradient boosted decision tree models to relate accident severity with recorded data sets and interactions of certain aggregate features (e.g., weather, and light condition). The visualization efforts produced road crash density maps by administrative region size and population Interactive maps that allow authorities to drill down (or zoom in) to hot spots were also developed. The programmatic approach developed in this study enables the repeatable application of the explanatory analysis and visualizations to new and old datasets with minimal effort. The findings from the study lay the foundation for a digital system that can be easily converted to an online platform for road and enforcement agencies to obtain reports and alerts on road crash risks and hot spots. The application was tested using crash data in Sri Lanka and the outcomes are presented in this study. Future work on the fusion of multiple data sources such as real-time weather data and traffic congestion levels onto the same platform can enhance these outcomes to even near real-time crash prediction to further assist proactive accident prevention measures.en_US
dc.identifier.accnoTH4919en_US
dc.identifier.citationDe Silva, H.W.I.U. (2022). Building explanatory models for road crash analysis using data science and machine learning technologies [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. http://dl.lib.uom.lk/handle/123/19697
dc.identifier.degreeM.Sc. in Transportationen_US
dc.identifier.departmentDepartment of Civil Engineeringen_US
dc.identifier.facultyEngineeringen_US
dc.identifier.urihttp://dl.lib.uom.lk/handle/123/19697
dc.language.isoenen_US
dc.subjectROAD SAFETYen_US
dc.subjectEXPLANATORY MODELSen_US
dc.subjectGEOSPATIAL CRASH VISUALIZATIONen_US
dc.subjectMULTI-FACETED ANALYSISen_US
dc.subjectROAD CRASHESen_US
dc.subjectEXPLORATORY DATA ANALYSISen_US
dc.subjectMACHINE LEARNING CRASH MODELSen_US
dc.subjectTRANSPORTATION - Dissertationen_US
dc.subjectCIVIL ENGINEERING - Dissertationen_US
dc.titleBuilding explanatory models for road crash analysis using data science and machine learning technologiesen_US
dc.typeThesis-Abstracten_US

Files