Exposing Shallow Heuristics of Relation Extraction Models with Challenge
Data
- URL: http://arxiv.org/abs/2010.03656v1
- Date: Wed, 7 Oct 2020 21:17:25 GMT
- Title: Exposing Shallow Heuristics of Relation Extraction Models with Challenge
Data
- Authors: Shachar Rosenman, Alon Jacovi, Yoav Goldberg
- Abstract summary: We identify failure modes of SOTA relation extraction (RE) models trained on TACRED.
By adding some of the challenge data as training examples, the performance of the model improves.
- Score: 49.378860065474875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The process of collecting and annotating training data may introduce
distribution artifacts which may limit the ability of models to learn correct
generalization behavior. We identify failure modes of SOTA relation extraction
(RE) models trained on TACRED, which we attribute to limitations in the data
annotation process. We collect and annotate a challenge-set we call Challenging
RE (CRE), based on naturally occurring corpus examples, to benchmark this
behavior. Our experiments with four state-of-the-art RE models show that they
have indeed adopted shallow heuristics that do not generalize to the
challenge-set data. Further, we find that alternative question answering
modeling performs significantly better than the SOTA models on the
challenge-set, despite worse overall TACRED performance. By adding some of the
challenge data as training examples, the performance of the model improves.
Finally, we provide concrete suggestion on how to improve RE data collection to
alleviate this behavior.
Related papers
- Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems [17.10762463903638]
We train evaluation models to approximate human evaluation, achieving high agreement.
We propose a weak-to-strong supervision method that uses a fraction of the annotated data to train an evaluation model.
arXiv Detail & Related papers (2024-06-26T10:48:14Z) - RewardBench: Evaluating Reward Models for Language Modeling [100.28366840977966]
We present RewardBench, a benchmark dataset and code-base for evaluation of reward models.
The dataset is a collection of prompt-chosen-rejected trios spanning chat, reasoning, and safety.
On the RewardBench leaderboard, we evaluate reward models trained with a variety of methods.
arXiv Detail & Related papers (2024-03-20T17:49:54Z) - Improving QA Model Performance with Cartographic Inoculation [0.0]
"Dataset artifacts" reduce the model's ability to generalize to real-world QA problems.
We analyze the impacts and incidence of dataset artifacts using an adversarial challenge set.
We show that by selectively fine-tuning a model on ambiguous adversarial examples from a challenge set, significant performance improvements can be made.
arXiv Detail & Related papers (2024-01-30T23:08:26Z) - Learning a model is paramount for sample efficiency in reinforcement
learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system.
We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z) - Factual Error Correction for Abstractive Summarization Models [41.77317902748772]
We propose a post-editing corrector module to correct factual errors in generated summaries.
We show that our model is able to correct factual errors in summaries generated by other neural summarization models.
We also find that transferring from artificial error correction to downstream settings is still very challenging.
arXiv Detail & Related papers (2020-10-17T04:24:16Z) - Data Rejuvenation: Exploiting Inactive Training Examples for Neural
Machine Translation [86.40610684026262]
In this work, we explore to identify the inactive training examples which contribute less to the model performance.
We introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples.
Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models.
arXiv Detail & Related papers (2020-10-06T08:57:31Z) - Beat the AI: Investigating Adversarial Human Annotation for Reading
Comprehension [27.538957000237176]
Humans create questions adversarially, such that the model fails to answer them correctly.
We collect 36,000 samples with progressively stronger models in the annotation loop.
We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets.
We find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop.
arXiv Detail & Related papers (2020-02-02T00:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.