Related papers: Fixing Model Bugs with Natural Language Patches

Fixing Model Bugs with Natural Language Patches

URL: http://arxiv.org/abs/2211.03318v1
Date: Mon, 7 Nov 2022 05:49:19 GMT
Title: Fixing Model Bugs with Natural Language Patches
Authors: Shikhar Murty, Christopher D. Manning, Scott Lundberg, Marco Tulio Ribeiro
Abstract summary: We explore natural language patches that allow developers to provide corrective feedback at the right level of abstraction. We show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data. We also show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.
Score: 38.67529353406759
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current approaches for fixing systematic problems in NLP models (e.g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts. In contrast, humans often provide corrections to each other through natural language. Taking inspiration from this, we explore natural language patches -- declarative statements that allow developers to provide corrective feedback at the right level of abstraction, either overriding the model (``if a review gives 2 stars, the sentiment is negative'') or providing additional information the model may lack (``if something is described as the bomb, then it is good''). We model the task of determining if a patch applies separately from the task of integrating patch information, and show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data -- 1 to 7 patches improve accuracy by ~1-4 accuracy points on different slices of a sentiment analysis dataset, and F1 by 7 points on a relation extraction dataset. Finally, we show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.

Related papers

From Superficial Patterns to Semantic Understanding: Fine-Tuning Language Models on Contrast Sets [0.21756081703275998]
This study explores how the robustness of a language model can be improved by exposing it to small amounts of more complex contrast sets during training. With this approach, the model recovers performance and achieves nearly 90% accuracy on contrast sets, highlighting the importance of diverse and challenging training data.
arXiv Detail & Related papers (2025-01-05T23:19:55Z)
Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model [6.097530398802087]
This paper explores the correlation between the degree of noise and its impact on language models through instruction tuning. Specifically, we found multiple intriguing findings of the correlation between the factuality of the dataset and instruction tuning.
arXiv Detail & Related papers (2024-04-15T12:20:09Z)
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence [80.6840060272386]
This paper identifies the importance of being geometry-aware for semantic correspondence. We show that incorporating this information can markedly enhance semantic correspondence performance. Our method achieves a PCK@0.10 score of 65.4 (zero-shot) and 85.6 (supervised) on the challenging SPair-71k dataset.
arXiv Detail & Related papers (2023-11-28T18:45:13Z)
Do Language Models Learn Semantics of Code? A Case Study in Vulnerability Detection [7.725755567907359]
We analyze the models using three distinct methods: interpretability tools, attention analysis, and interaction matrix analysis. We develop two annotation methods which highlight the bug semantics inside the model's inputs. Our findings indicate that it is helpful to provide the model with information of the bug semantics, that the model can attend to it, and motivate future work in learning more complex path-based bug semantics.
arXiv Detail & Related papers (2023-11-07T16:31:56Z)
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)
Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness [8.606215760860362]
We turn the patch correctness assessment into a Question Answering problem. We consider as inputs the bug reports as well as the natural language description of the generated patches. Experiments show that Quatrain can achieve an AUC of 0.886 on predicting patch correctness.
arXiv Detail & Related papers (2022-08-08T13:32:58Z)
Label-Descriptive Patterns and their Application to Characterizing Classification Errors [31.272875287136426]
State-of-the-art deep learning methods achieve human-like performance on many tasks, but make errors nevertheless. Characterizing these errors in easily interpretable terms gives insight into whether a model is prone to making systematic errors, but also gives a way to act and improve the model. In this paper we propose a method that allows us to do so for arbitrary classifiers by mining a small set of patterns that together succinctly describe the input data that is partitioned according to correctness of prediction.
arXiv Detail & Related papers (2021-10-18T19:42:21Z)
On The Ingredients of an Effective Zero-shot Semantic Parser [95.01623036661468]
We analyze zero-shot learning by paraphrasing training examples of canonical utterances and programs from a grammar. We propose bridging these gaps using improved grammars, stronger paraphrasers, and efficient learning methods. Our model achieves strong performance on two semantic parsing benchmarks (Scholar, Geo) with zero labeled data.
arXiv Detail & Related papers (2021-10-15T21:41:16Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches [52.67723703088284]
We propose a novel framework called multi-patch generative adversarial nets (MPGAN) MPGAN synthesises local patch features and labels unseen classes with a novel weighted voting strategy. MPGAN has significantly greater accuracy than state-of-the-art methods.
arXiv Detail & Related papers (2020-07-27T05:49:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.