Performance evaluation of machine learning pipelines for pore pressure prediction
Loading...
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Department of Earth Resources Engineering, University of Moratuwa, Sri Lanka
Abstract
Accurate pore pressure prediction is critical for safe drilling operations. Conventional prediction methods, which rely on simplified empirical assumptions, often fail to capture the multivariate and non-linear relationships present in complex geological settings. Machine learning (ML) provides a data-driven approach that can model these complexities directly from well log data without relying on predefined physical equations. However, the practical application of ML is often inconsistent due to a lack of systematic understanding of how data preprocessing choices impact final model performance. This study aims to resolve this uncertainty by identifying the optimal combination of preprocessing strategy and ML algorithm for this task. A comparative analysis was conducted across four scenarios: raw data, outlier-capped data, feature-selected data, and combined preprocessing (outlier capping and feature selection) using six ML algorithms to systematically evaluate the effects of outlier capping and the removal of multicollinear features. The findings identify a tuned XGBoost model as the top performer (R² = 0.9789), achieving this optimal result on the raw, unprocessed dataset. This key finding, when analyzed in the context of the other experimental scenarios, demonstrates that removing linearly correlated features can be detrimental to advanced models and that the necessity of outlier treatment is algorithm dependent. This study concludes that while the data preparation strategy is universal, it is closely tied to algorithm choice, offering a context-aware framework to enhance model reliability and support interpretability in future research.
