EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data
Augmentation for Multi-hop Fact Verification
- URL: http://arxiv.org/abs/2310.14508v1
- Date: Mon, 23 Oct 2023 02:39:14 GMT
- Title: EXPLAIN, EDIT, GENERATE: Rationale-Sensitive Counterfactual Data
Augmentation for Multi-hop Fact Verification
- Authors: Yingjie Zhu, Jiasheng Si, Yibo Zhao, Haiyang Zhu, Deyu Zhou, Yulan He
- Abstract summary: We develop a rationale-sensitive method to generate linguistically diverse and label-flipping counterfactuals.
In specific, the diverse and fluent counterfactuals are generated via an Explain-Edit-Generate architecture.
Experimental results show that the proposed approach outperforms the SOTA baselines.
- Score: 28.453817513380276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic multi-hop fact verification task has gained significant attention
in recent years. Despite impressive results, these well-designed models perform
poorly on out-of-domain data. One possible solution is to augment the training
data with counterfactuals, which are generated by minimally altering the causal
features of the original data. However, current counterfactual data
augmentation techniques fail to handle multi-hop fact verification due to their
incapability to preserve the complex logical relationships within multiple
correlated texts. In this paper, we overcome this limitation by developing a
rationale-sensitive method to generate linguistically diverse and
label-flipping counterfactuals while preserving logical relationships. In
specific, the diverse and fluent counterfactuals are generated via an
Explain-Edit-Generate architecture. Moreover, the checking and filtering
modules are proposed to regularize the counterfactual data with logical
relations and flipped labels. Experimental results show that the proposed
approach outperforms the SOTA baselines and can generate linguistically diverse
counterfactual data without disrupting their logical relationships.
Related papers
- DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic [51.967603572656266]
We introduce a consistent and theoretically grounded approach to annotating decompositional entailment.
We find that our new dataset, RDTE, has a substantially higher internal consistency (+9%) than prior decompositional entailment datasets.
We also find that training an RDTE-oriented entailment classifier via knowledge distillation and employing it in an entailment tree reasoning engine significantly improves both accuracy and proof quality.
arXiv Detail & Related papers (2024-02-22T18:55:17Z) - Improving Classifier Robustness through Active Generation of Pairwise
Counterfactuals [22.916599410472102]
We present a novel framework that utilizes counterfactual generative models to generate a large number of diverse counterfactuals.
We show that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels.
arXiv Detail & Related papers (2023-05-22T23:19:01Z) - Abstract Meaning Representation-Based Logic-Driven Data Augmentation for Logical Reasoning [27.224364543134094]
We introduce a novel logic-driven data augmentation approach, AMR-LDA.
AMR-LDA converts the original text into an Abstract Meaning Representation (AMR) graph.
The modified AMR graphs are subsequently converted back into text to create augmented data.
arXiv Detail & Related papers (2023-05-21T23:16:26Z) - MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
Generation [102.20036684996248]
We propose MURMUR, a neuro-symbolic modular approach to text generation from semi-structured data with multi-step reasoning.
We conduct experiments on two data-to-text generation tasks like WebNLG and LogicNLG.
arXiv Detail & Related papers (2022-12-16T17:36:23Z) - NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer
Data Augmentation [55.17069935305069]
We introduce NeuroCounterfactuals, designed as loose counterfactuals, allowing for larger edits which result in naturalistic generations containing linguistic diversity.
Our novel generative approach bridges the benefits of constrained decoding, with those of language model adaptation for sentiment steering.
arXiv Detail & Related papers (2022-10-22T06:29:21Z) - CORE: A Retrieve-then-Edit Framework for Counterfactual Data Generation [91.16551253297588]
COunterfactual Generation via Retrieval and Editing (CORE) is a retrieval-augmented generation framework for creating diverse counterfactual perturbations for training.
CORE first performs a dense retrieval over a task-related unlabeled text corpus using a learned bi-encoder.
CORE then incorporates these into prompts to a large language model with few-shot learning capabilities, for counterfactual editing.
arXiv Detail & Related papers (2022-10-10T17:45:38Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - Improving Commonsense Causal Reasoning by Adversarial Training and Data
Augmentation [14.92157586545743]
This paper presents a number of techniques for making models more robust in the domain of causal reasoning.
We show a statistically significant improvement on performance and on both datasets, even with only a small number of additionally generated data points.
arXiv Detail & Related papers (2021-01-13T09:55:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.