Estimating oil recovery factor using machine learning: Applications of
XGBoost classification
- URL: http://arxiv.org/abs/2210.16345v1
- Date: Fri, 28 Oct 2022 18:21:25 GMT
- Title: Estimating oil recovery factor using machine learning: Applications of
XGBoost classification
- Authors: Alireza Roustazadeh, Behzad Ghanbarian, Frank Male, Mohammad B.
Shadmand, Vahid Taslimitehrani, and Larry W. Lake
- Abstract summary: In petroleum engineering, it is essential to determine the ultimate recovery factor, RF, particularly before exploitation and exploration.
We, therefore, applied machine learning (ML), using readily available features, to estimate oil RF for ten classes defined in this study.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In petroleum engineering, it is essential to determine the ultimate recovery
factor, RF, particularly before exploitation and exploration. However,
accurately estimating requires data that is not necessarily available or
measured at early stages of reservoir development. We, therefore, applied
machine learning (ML), using readily available features, to estimate oil RF for
ten classes defined in this study. To construct the ML models, we applied the
XGBoost classification algorithm. Classification was chosen because recovery
factor is bounded from 0 to 1, much like probability. Three databases were
merged, leaving us with four different combinations to first train and test the
ML models and then further evaluate them using an independent database
including unseen data. The cross-validation method with ten folds was applied
on the training datasets to assess the effectiveness of the models. To evaluate
the accuracy and reliability of the models, the accuracy, neighborhood
accuracy, and macro averaged f1 score were determined. Overall, results showed
that the XGBoost classification algorithm could estimate the RF class with
reasonable accuracies as high as 0.49 in the training datasets, 0.34 in the
testing datasets and 0.2 in the independent databases used. We found that the
reliability of the XGBoost model depended on the data in the training dataset
meaning that the ML models were database dependent. The feature importance
analysis and the SHAP approach showed that the most important features were
reserves and reservoir area and thickness.
Related papers
- What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? [83.83230167222852]
We find that a model's generalization behavior can be effectively characterized by a training metric we call pre-memorization train accuracy.
By connecting a model's learning behavior to its generalization, pre-memorization train accuracy can guide targeted improvements to training strategies.
arXiv Detail & Related papers (2024-11-12T09:52:40Z) - An Evaluation of Machine Learning Approaches for Early Diagnosis of
Autism Spectrum Disorder [0.0]
Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities.
This study employs diverse machine learning methods to identify crucial ASD traits, aiming to enhance and automate the diagnostic process.
arXiv Detail & Related papers (2023-09-20T21:23:37Z) - Convolutional Neural Networks for the classification of glitches in
gravitational-wave data streams [52.77024349608834]
We classify transient noise signals (i.e.glitches) and gravitational waves in data from the Advanced LIGO detectors.
We use models with a supervised learning approach, both trained from scratch using the Gravity Spy dataset.
We also explore a self-supervised approach, pre-training models with automatically generated pseudo-labels.
arXiv Detail & Related papers (2023-03-24T11:12:37Z) - Exploring the Value of Pre-trained Language Models for Clinical Named
Entity Recognition [6.917786124918387]
We compare Transformer models that are trained from scratch to fine-tuned BERT-based LLMs.
We examine the impact of an additional CRF layer on such models to encourage contextual learning.
arXiv Detail & Related papers (2022-10-23T16:27:31Z) - Estimating oil and gas recovery factors via machine learning:
Database-dependent accuracy and reliability [0.0]
A key reservoir property is hydrocarbon recovery factor (RF) whose accurate estimation would provide decisive insights to drilling and production strategies.
This study aims to estimate the hydrocarbon RF for exploration from various reservoir characteristics, such as porosity, permeability, pressure, and water saturation via the machine learning (ML) approach.
arXiv Detail & Related papers (2022-10-22T16:25:49Z) - A Case Study on the Classification of Lost Circulation Events During
Drilling using Machine Learning Techniques on an Imbalanced Large Dataset [0.0]
We utilize a 65,000+ records data with class imbalance problem from Azadegan oilfield formations in Iran.
Eleven of the dataset's seventeen parameters are chosen to be used in the classification of five lost circulation events.
To generate classification models, we used six basic machine learning algorithms and four ensemble learning methods.
arXiv Detail & Related papers (2022-09-04T12:28:40Z) - Learning brain MRI quality control: a multi-factorial generalization
problem [0.0]
This work aimed at evaluating the performances of the MRIQC pipeline on various large-scale datasets.
We focused our analysis on the MRIQC preprocessing steps and tested the pipeline with and without them.
We concluded that a model trained with data from a heterogeneous population, such as the CATI dataset, provides the best scores on unseen data.
arXiv Detail & Related papers (2022-05-31T15:46:44Z) - Learning to be a Statistician: Learned Estimator for Number of Distinct
Values [54.629042119819744]
Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems.
In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples.
We propose to formulate the NDV estimation task in a supervised learning framework, and aim to learn a model as the estimator.
arXiv Detail & Related papers (2022-02-06T15:42:04Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Evaluating State-of-the-Art Classification Models Against Bayes
Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows.
We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.