A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior
- URL: http://arxiv.org/abs/2505.08585v1
- Date: Tue, 13 May 2025 13:56:43 GMT
- Title: A Large-scale Benchmark on Geological Fault Delineation Models: Domain Shift, Training Dynamics, Generalizability, Evaluation and Inferential Behavior
- Authors: Jorge Quesada, Chen Zhou, Prithwijit Chowdhury, Mohammad Alotaibi, Ahmad Mustafa, Yusufjon Kumamnov, Mohit Prabhushankar, Ghassan AlRegib,
- Abstract summary: We present the first large-scale benchmarking study designed to provide answers and guidelines for domain shift strategies in seismic interpretation.<n>Our benchmark encompasses over $200$ models trained and evaluated on three heterogeneous datasets.<n>Our analysis highlights the fragility of current fine-tuning practices, the emergence of catastrophic forgetting, and the challenges of interpreting performance in a systematic manner.
- Score: 12.23379456993682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning has taken a critical role in seismic interpretation workflows, especially in fault delineation tasks. However, despite the recent proliferation of pretrained models and synthetic datasets, the field still lacks a systematic understanding of the generalizability limits of these models across seismic data representing a variety of geologic, acquisition and processing settings. Distributional shifts between different data sources, limitations in fine-tuning strategies and labeled data accessibility, and inconsistent evaluation protocols all represent major roadblocks in the deployment of reliable and robust models in real-world exploration settings. In this paper, we present the first large-scale benchmarking study explicitly designed to provide answers and guidelines for domain shift strategies in seismic interpretation. Our benchmark encompasses over $200$ models trained and evaluated on three heterogeneous datasets (synthetic and real data) including FaultSeg3D, CRACKS, and Thebe. We systematically assess pretraining, fine-tuning, and joint training strategies under varying degrees of domain shift. Our analysis highlights the fragility of current fine-tuning practices, the emergence of catastrophic forgetting, and the challenges of interpreting performance in a systematic manner. We establish a robust experimental baseline to provide insights into the tradeoffs inherent to current fault delineation workflows, and shed light on directions for developing more generalizable, interpretable and effective machine learning models for seismic interpretation. The insights and analyses reported provide a set of guidelines on the deployment of fault delineation models within seismic interpretation workflows.
Related papers
- Anomaly Detection and Generation with Diffusion Models: A Survey [51.61574868316922]
Anomaly detection (AD) plays a pivotal role across diverse domains, including cybersecurity, finance, healthcare, and industrial manufacturing.<n>Recent advancements in deep learning, specifically diffusion models (DMs), have sparked significant interest.<n>This survey aims to guide researchers and practitioners in leveraging DMs for innovative AD solutions across diverse applications.
arXiv Detail & Related papers (2025-06-11T03:29:18Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Constructing balanced datasets for predicting failure modes in structural systems under seismic hazards [1.192436948211501]
This study proposes a framework for constructing balanced datasets that include distinct failure modes.<n>Deep neural network models are trained on balanced and imbalanced datasets to highlight the importance of dataset balancing.
arXiv Detail & Related papers (2025-02-26T22:11:51Z) - Mitigating Attrition: Data-Driven Approach Using Machine Learning and Data Engineering [0.0]
This paper presents a novel data-driven approach to mitigating employee attrition using machine learning and data engineering techniques.<n>The proposed framework integrates data from various human resources systems and leverages advanced feature engineering to capture a comprehensive set of factors influencing attrition.
arXiv Detail & Related papers (2025-02-25T05:29:45Z) - Out-of-Distribution Detection on Graphs: A Survey [58.47395497985277]
Graph out-of-distribution (GOOD) detection focuses on identifying graph data that deviates from the distribution seen during training.<n>We categorize existing methods into four types: enhancement-based, reconstruction-based, information propagation-based, and classification-based approaches.<n>We discuss practical applications and theoretical foundations, highlighting the unique challenges posed by graph data.
arXiv Detail & Related papers (2025-02-12T04:07:12Z) - An Experimental Evaluation of Imputation Models for Spatial-Temporal Traffic Data [20.294382826251994]
This work aims to provide a holistic understanding of traffic data imputation research and serve as a practical guideline.<n>We first propose practice-oriented for missing patterns and imputation models, systematically identifying all possible forms of real-world traffic data loss.<n> Furthermore, we introduce a unified benchmarking pipeline to comprehensively evaluate 10 representative models across various missing patterns and rates.
arXiv Detail & Related papers (2024-12-06T02:47:54Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - FaultSeg Swin-UNETR: Transformer-Based Self-Supervised Pretraining Model
for Fault Recognition [13.339333273943842]
This paper introduces an approach to enhance seismic fault recognition through self-supervised pretraining.
We have employed the Swin Transformer model as the core network and employed the SimMIM pretraining task to capture unique features related to discontinuities in seismic data.
Experimental results demonstrate that our proposed method attains state-of-the-art performance on the Thebe dataset, as measured by the OIS and ODS metrics.
arXiv Detail & Related papers (2023-10-27T08:38:59Z) - SimSCOOD: Systematic Analysis of Out-of-Distribution Generalization in
Fine-tuned Source Code Models [58.78043959556283]
We study the behaviors of models under different fine-tuning methodologies, including full fine-tuning and Low-Rank Adaptation (LoRA) fine-tuning methods.
Our analysis uncovers that LoRA fine-tuning consistently exhibits significantly better OOD generalization performance than full fine-tuning across various scenarios.
arXiv Detail & Related papers (2022-10-10T16:07:24Z) - An Information-theoretic Approach to Distribution Shifts [9.475039534437332]
Safely deploying machine learning models to the real world is often a challenging process.
Models trained with data obtained from a specific geographic location tend to fail when queried with data obtained elsewhere.
neural networks that are fit to a subset of the population might carry some selection bias into their decision process.
arXiv Detail & Related papers (2021-06-07T16:44:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.