Doing Good or Doing Right? Exploring the Weakness of Commonsense Causal
Reasoning Models
- URL: http://arxiv.org/abs/2107.01791v1
- Date: Mon, 5 Jul 2021 05:08:30 GMT
- Title: Doing Good or Doing Right? Exploring the Weakness of Commonsense Causal
Reasoning Models
- Authors: Mingyue Han and Yinglin Wang
- Abstract summary: We investigate the problem of semantic similarity bias and reveal the vulnerability of current COPA models by certain attacks.
We mitigate this problem by simply adding a regularization loss and experimental results show that this solution improves the model's generalization ability.
- Score: 0.38073142980733
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained language models (PLM) achieve surprising performance on the Choice
of Plausible Alternatives (COPA) task. However, whether PLMs have truly
acquired the ability of causal reasoning remains a question. In this paper, we
investigate the problem of semantic similarity bias and reveal the
vulnerability of current COPA models by certain attacks. Previous solutions
that tackle the superficial cues of unbalanced token distribution still
encounter the same problem of semantic bias, even more seriously due to the
utilization of more training data. We mitigate this problem by simply adding a
regularization loss and experimental results show that this solution not only
improves the model's generalization ability, but also assists the models to
perform more robustly on a challenging dataset, BCOPA-CE, which has unbiased
token distribution and is more difficult for models to distinguish cause and
effect.
Related papers
- Improving Group Robustness on Spurious Correlation via Evidential Alignment [26.544938760265136]
Deep neural networks often learn and rely on spurious correlations, i.e., superficial associations between non-causal features and the targets.<n>Existing methods typically mitigate this issue by using external group annotations or auxiliary deterministic models.<n>We propose Evidential Alignment, a novel framework that leverages uncertainty quantification to understand the behavior of the biased models.
arXiv Detail & Related papers (2025-06-12T22:47:21Z) - Preference Learning for AI Alignment: a Causal Perspective [55.2480439325792]
We frame this problem in a causal paradigm, providing the rich toolbox of causality to identify persistent challenges.<n>Inheriting from the literature of causal inference, we identify key assumptions necessary for reliable generalisation.<n>We illustrate failure modes of naive reward models and demonstrate how causally-inspired approaches can improve model robustness.
arXiv Detail & Related papers (2025-06-06T10:45:42Z) - EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Networks [55.214444066134114]
In this study, we identify this oversight as a source of incompressible error, termed the equivariance gap, which introduces instability in predictions.
To mitigate these issues, we propose a novel model designed to preserve equivariance across output dimensions.
arXiv Detail & Related papers (2025-02-10T17:11:20Z) - Self-supervised Analogical Learning using Language Models [59.64260218737556]
We propose SAL, a self-supervised analogical learning framework.
SAL mimics the human analogy process and trains models to explicitly transfer high-quality symbolic solutions.
We show that the resulting models outperform base language models on a wide range of reasoning benchmarks.
arXiv Detail & Related papers (2025-02-03T02:31:26Z) - Adversarial Transferability in Deep Denoising Models: Theoretical Insights and Robustness Enhancement via Out-of-Distribution Typical Set Sampling [6.189440665620872]
Deep learning-based image denoising models demonstrate remarkable performance, but their lack of robustness analysis remains a significant concern.
A major issue is that these models are susceptible to adversarial attacks, where small, carefully crafted perturbations to input data can cause them to fail.
We propose a novel adversarial defense method: the Out-of-Distribution Typical Set Sampling Training strategy.
arXiv Detail & Related papers (2024-12-08T13:47:57Z) - Towards Robust Text Classification: Mitigating Spurious Correlations with Causal Learning [2.7813683000222653]
We propose the Causally Calibrated Robust ( CCR) to reduce models' reliance on spurious correlations.
CCR integrates a causal feature selection method based on counterfactual reasoning, along with an inverse propensity weighting (IPW) loss function.
We show that CCR state-of-the-art performance among methods without group labels, and in some cases, it can compete with the models that utilize group labels.
arXiv Detail & Related papers (2024-11-01T21:29:07Z) - Is Difficulty Calibration All We Need? Towards More Practical Membership Inference Attacks [16.064233621959538]
We propose a query-efficient and computation-efficient MIA that directly textbfRe-levertextbfAges the original membershitextbfP scores to mtextbfItigate the errors in textbfDifficulty calibration.
arXiv Detail & Related papers (2024-08-31T11:59:42Z) - Debiasing Algorithm through Model Adaptation [5.482673673984126]
We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey bias.
Based on the analysis results, we intervene in the model by applying a linear projection to the weight matrices of these layers.
Our titular method, DAMA, significantly decreases bias as measured by diverse metrics while maintaining the model's performance on downstream tasks.
arXiv Detail & Related papers (2023-10-29T05:50:03Z) - Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy.
We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples.
Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Identifying and Mitigating Spurious Correlations for Improving
Robustness in NLP Models [19.21465581259624]
Many problems can be attributed to models exploiting spurious correlations, or shortcuts between the training data and the task labels.
In this paper, we aim to automatically identify such spurious correlations in NLP models at scale.
We show that our proposed method can effectively and efficiently identify a scalable set of "shortcuts", and mitigating these leads to more robust models in multiple applications.
arXiv Detail & Related papers (2021-10-14T21:40:03Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.