Out-of-Sample Hydrocarbon Production Forecasting: Time Series Machine Learning using Productivity Index-Driven Features and Inductive Conformal Prediction
- URL: http://arxiv.org/abs/2508.14078v1
- Date: Tue, 12 Aug 2025 19:14:46 GMT
- Title: Out-of-Sample Hydrocarbon Production Forecasting: Time Series Machine Learning using Productivity Index-Driven Features and Inductive Conformal Prediction
- Authors: Mohamed Hassan Abdalla Idris, Jakub Marek Cebula, Jebraeel Gholinezhad, Shamsul Masum, Hongjie Ma,
- Abstract summary: This research introduces a new ML framework designed to enhance the robustness of out-of-sample hydrocarbon production forecasting.<n>Utilizing historical data from the Volve (wells PF14, PF12) and Norne (well E1H) oil fields, this study investigates the efficacy of various predictive algorithms.
- Score: 1.1534313664323632
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research introduces a new ML framework designed to enhance the robustness of out-of-sample hydrocarbon production forecasting, specifically addressing multivariate time series analysis. The proposed methodology integrates Productivity Index (PI)-driven feature selection, a concept derived from reservoir engineering, with Inductive Conformal Prediction (ICP) for rigorous uncertainty quantification. Utilizing historical data from the Volve (wells PF14, PF12) and Norne (well E1H) oil fields, this study investigates the efficacy of various predictive algorithms-namely Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and eXtreme Gradient Boosting (XGBoost) - in forecasting historical oil production rates (OPR_H). All the models achieved "out-of-sample" production forecasts for an upcoming future timeframe. Model performance was comprehensively evaluated using traditional error metrics (e.g., MAE) supplemented by Forecast Bias and Prediction Direction Accuracy (PDA) to assess bias and trend-capturing capabilities. The PI-based feature selection effectively reduced input dimensionality compared to conventional numerical simulation workflows. The uncertainty quantification was addressed using the ICP framework, a distribution-free approach that guarantees valid prediction intervals (e.g., 95% coverage) without reliance on distributional assumptions, offering a distinct advantage over traditional confidence intervals, particularly for complex, non-normal data. Results demonstrated the superior performance of the LSTM model, achieving the lowest MAE on test (19.468) and genuine out-of-sample forecast data (29.638) for well PF14, with subsequent validation on Norne well E1H. These findings highlight the significant potential of combining domain-specific knowledge with advanced ML techniques to improve the reliability of hydrocarbon production forecasts.
Related papers
- STAR : Bridging Statistical and Agentic Reasoning for Large Model Performance Prediction [78.0692157478247]
We propose STAR, a framework that bridges data-driven STatistical expectations with knowledge-driven Agentic Reasoning.<n>We show that STAR consistently outperforms all baselines on both score-based and rank-based metrics.
arXiv Detail & Related papers (2026-02-12T16:30:07Z) - Robust Probabilistic Load Forecasting for a Single Household: A Comparative Study from SARIMA to Transformers on the REFIT Dataset [0.0]
This paper tackles the challenge using the volatile REFIT household dataset.<n>We first address this by conducting a rigorous comparative experiment to select a Seasonal Imputation method.<n>We then systematically evaluate a hierarchy of models, progressing from classical baselines to machine learning.<n>Our findings reveal that classical models fail to capture the data's non-linear, regime-switching behavior.
arXiv Detail & Related papers (2025-11-30T12:05:18Z) - Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts [100.26854618129039]
Weather forecasting is fundamentally challenged by the chaotic nature of the atmosphere.<n>Recent advances in Bayesian Deep Learning (BDL) offer a promising but often disconnected alternative.<n>We bridge these paradigms through a unified hybrid BDL framework for ensemble weather forecasting.
arXiv Detail & Related papers (2025-11-18T07:49:52Z) - A Novel Method to Manage Production on Industry 4.0: Forecasting Overall Equipment Efficiency by Time Series with Topological Features [0.0]
Overall equipment efficiency (OEE) is a key manufacturing production, but its volatile nature complicates short-term forecasting.<n>This study presents a novel framework combining time series decomposition and topological data analysis to improve OEE prediction across various equipment.
arXiv Detail & Related papers (2025-06-20T10:04:49Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - A Generative Framework for Causal Estimation via Importance-Weighted Diffusion Distillation [55.53426007439564]
Estimating individualized treatment effects from observational data is a central challenge in causal inference.<n>In inverse probability weighting (IPW) is a well-established solution to this problem, but its integration into modern deep learning frameworks remains limited.<n>We propose Importance-Weighted Diffusion Distillation (IWDD), a novel generative framework that combines the pretraining of diffusion models with importance-weighted score distillation.
arXiv Detail & Related papers (2025-05-16T17:00:52Z) - Effective Probabilistic Time Series Forecasting with Fourier Adaptive Noise-Separated Diffusion [13.640501360968782]
FALDA is a novel probabilistic framework for time series forecasting.<n>It incorporates a component-specific architecture, enabling tailored modeling of individual temporal components.<n>Experiments on six real-world benchmarks demonstrate that FALDA consistently outperforms existing probabilistic forecasting approaches.<n>FALDA also achieves superior overall performance compared to state-of-the-art (SOTA) point forecasting approaches, with improvements of up to 9%.
arXiv Detail & Related papers (2025-05-16T14:32:34Z) - Prediction-Powered Adaptive Shrinkage Estimation [0.9208007322096532]
Prediction-Powered Adaptive Shrinkage (PAS) is a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means.<n>PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.
arXiv Detail & Related papers (2025-02-20T00:24:05Z) - Scaling Laws for Predicting Downstream Performance in LLMs [75.28559015477137]
This work focuses on the pre-training loss as a more computation-efficient metric for performance estimation.<n>We present FLP-M, a fundamental approach for performance prediction that addresses the practical need to integrate datasets from multiple sources during pre-training.
arXiv Detail & Related papers (2024-10-11T04:57:48Z) - Conformal Approach To Gaussian Process Surrogate Evaluation With
Coverage Guarantees [47.22930583160043]
We propose a method for building adaptive cross-conformal prediction intervals.
The resulting conformal prediction intervals exhibit a level of adaptivity akin to Bayesian credibility sets.
The potential applicability of the method is demonstrated in the context of surrogate modeling of an expensive-to-evaluate simulator of the clogging phenomenon in steam generators of nuclear reactors.
arXiv Detail & Related papers (2024-01-15T14:45:18Z) - Prediction-Oriented Bayesian Active Learning [51.426960808684655]
Expected predictive information gain (EPIG) is an acquisition function that measures information gain in the space of predictions rather than parameters.
EPIG leads to stronger predictive performance compared with BALD across a range of datasets and models.
arXiv Detail & Related papers (2023-04-17T10:59:57Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - Energy Forecasting in Smart Grid Systems: A Review of the
State-of-the-art Techniques [2.3436632098950456]
This paper presents a review of state-of-the-art forecasting methods for smart grid (SG) systems.
Traditional point forecasting methods including statistical, machine learning (ML), and deep learning (DL) are extensively investigated.
A comparative case study using the Victorian electricity consumption and American electric power (AEP) is conducted.
arXiv Detail & Related papers (2020-11-25T09:17:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.