Fixing Model Bugs with Natural Language Patches
- URL: http://arxiv.org/abs/2211.03318v1
- Date: Mon, 7 Nov 2022 05:49:19 GMT
- Title: Fixing Model Bugs with Natural Language Patches
- Authors: Shikhar Murty, Christopher D. Manning, Scott Lundberg, Marco Tulio
Ribeiro
- Abstract summary: We explore natural language patches that allow developers to provide corrective feedback at the right level of abstraction.
We show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data.
We also show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.
- Score: 38.67529353406759
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current approaches for fixing systematic problems in NLP models (e.g. regex
patches, finetuning on more data) are either brittle, or labor-intensive and
liable to shortcuts. In contrast, humans often provide corrections to each
other through natural language. Taking inspiration from this, we explore
natural language patches -- declarative statements that allow developers to
provide corrective feedback at the right level of abstraction, either
overriding the model (``if a review gives 2 stars, the sentiment is negative'')
or providing additional information the model may lack (``if something is
described as the bomb, then it is good''). We model the task of determining if
a patch applies separately from the task of integrating patch information, and
show that with a small amount of synthetic data, we can teach models to
effectively use real patches on real data -- 1 to 7 patches improve accuracy by
~1-4 accuracy points on different slices of a sentiment analysis dataset, and
F1 by 7 points on a relation extraction dataset. Finally, we show that
finetuning on as many as 100 labeled examples may be needed to match the
performance of a small set of language patches.
Related papers
- Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model [6.097530398802087]
This paper explores the correlation between the degree of noise and its impact on language models through instruction tuning.
Specifically, we found multiple intriguing findings of the correlation between the factuality of the dataset and instruction tuning.
arXiv Detail & Related papers (2024-04-15T12:20:09Z) - Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence [80.6840060272386]
This paper identifies the importance of being geometry-aware for semantic correspondence.
We show that incorporating this information can markedly enhance semantic correspondence performance.
Our method achieves a PCK@0.10 score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset.
arXiv Detail & Related papers (2023-11-28T18:45:13Z) - Do Language Models Learn Semantics of Code? A Case Study in
Vulnerability Detection [7.725755567907359]
We analyze the models using three distinct methods: interpretability tools, attention analysis, and interaction matrix analysis.
We develop two annotation methods which highlight the bug semantics inside the model's inputs.
Our findings indicate that it is helpful to provide the model with information of the bug semantics, that the model can attend to it, and motivate future work in learning more complex path-based bug semantics.
arXiv Detail & Related papers (2023-11-07T16:31:56Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Discover, Explanation, Improvement: An Automatic Slice Detection
Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints.
This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks.
Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z) - Is this Change the Answer to that Problem? Correlating Descriptions of
Bug and Code Changes for Evaluating Patch Correctness [8.606215760860362]
We turn the patch correctness assessment into a Question Answering problem.
We consider as inputs the bug reports as well as the natural language description of the generated patches.
Experiments show that Quatrain can achieve an AUC of 0.886 on predicting patch correctness.
arXiv Detail & Related papers (2022-08-08T13:32:58Z) - Label-Descriptive Patterns and their Application to Characterizing
Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless.
Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model.
In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - Rethinking Generative Zero-Shot Learning: An Ensemble Learning
Perspective for Recognising Visual Patches [52.67723703088284]
We propose a novel framework called multi-patch generative adversarial nets (MPGAN)
MPGAN synthesises local patch features and labels unseen classes with a novel weighted voting strategy.
MPGAN has significantly greater accuracy than state-of-the-art methods.
arXiv Detail & Related papers (2020-07-27T05:49:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.