CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples
- URL: http://arxiv.org/abs/2508.21083v1
- Date: Tue, 26 Aug 2025 07:49:33 GMT
- Title: CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples
- Authors: Kyohoon Jin, Juhwan Choi, Jungmin Yun, Junho Lee, Soojin Jang, Youngbin Kim,
- Abstract summary: We introduce a more general form of counterfactual data augmentation, termed counterbias data augmentation.<n>We present CoBA: CounterBias Augmentation, a unified framework that operates at the semantic triple level.<n>We show that CoBA not only improves downstream task performance, but also effectively reduces biases and strengthens out-of-distribution resilience.
- Score: 37.584469264091744
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models often learn and exploit spurious correlations in training data, using these non-target features to inform their predictions. Such reliance leads to performance degradation and poor generalization on unseen data. To address these limitations, we introduce a more general form of counterfactual data augmentation, termed counterbias data augmentation, which simultaneously tackles multiple biases (e.g., gender bias, simplicity bias) and enhances out-of-distribution robustness. We present CoBA: CounterBias Augmentation, a unified framework that operates at the semantic triple level: first decomposing text into subject-predicate-object triples, then selectively modifying these triples to disrupt spurious correlations. By reconstructing the text from these adjusted triples, CoBA generates counterbias data that mitigates spurious patterns. Through extensive experiments, we demonstrate that CoBA not only improves downstream task performance, but also effectively reduces biases and strengthens out-of-distribution resilience, offering a versatile and robust solution to the challenges posed by spurious correlations.
Related papers
- Enhanced Data Transfer Cooperating with Artificial Triplets for Scene Graph Generation [15.109087477826106]
This work focuses on training dataset enhancement of informative relational triplets for Scene Graph Generation (SGG)
We propose two novel training dataset enhancement modules: Feature Space Triplet Augmentation (FSTA) and Soft Transfer.
Experimental results show that integrating FSTA and Soft Transfer achieve high levels of both Recall and mean Recall in Visual Genome dataset.
arXiv Detail & Related papers (2024-06-27T16:52:01Z) - IPED: An Implicit Perspective for Relational Triple Extraction based on
Diffusion Model [7.894136732348917]
Implicit Perspective for triple Extraction based on Diffusion model (IPED)
We propose an Implicit Perspective for triple Extraction based on Diffusion model (IPED)
Our solution adopts an implicit using block coverage to complete the tables, avoiding the limitations of explicit tagging methods.
arXiv Detail & Related papers (2024-02-24T14:18:11Z) - REST: Enhancing Group Robustness in DNNs through Reweighted Sparse
Training [49.581884130880944]
Deep neural network (DNN) has been proven effective in various domains.
However, they often struggle to perform well on certain minority groups during inference.
arXiv Detail & Related papers (2023-12-05T16:27:54Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.<n>Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation [91.16551253297588]
COunterfactual Generation via Retrieval and Editing (CORE) is a retrieval-augmented generation framework for creating diverse counterfactual perturbations for training.
CORE first performs a dense retrieval over a task-related unlabeled text corpus using a learned bi-encoder.
CORE then incorporates these into prompts to a large language model with few-shot learning capabilities, for counterfactual editing.
arXiv Detail & Related papers (2022-10-10T17:45:38Z) - Counterfactual Data Augmentation improves Factuality of Abstractive
Summarization [6.745946263790011]
We show that augmenting the training data with our approach improves the factual correctness of summaries without significantly affecting the ROUGE score.
We show that in two commonly used summarization datasets (CNN/Dailymail and XSum), we improve the factual correctness by about 2.5 points on average.
arXiv Detail & Related papers (2022-05-25T00:00:35Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.