Explanation-Based Human Debugging of NLP Models: A Survey
- URL: http://arxiv.org/abs/2104.15135v1
- Date: Fri, 30 Apr 2021 17:53:07 GMT
- Title: Explanation-Based Human Debugging of NLP Models: A Survey
- Authors: Piyawat Lertvittayakumjorn, Francesca Toni
- Abstract summary: We call this problem explanation-based human debug (EBHD)
In particular, we categorize and discuss existing works along three main dimensions of EBHD.
- Score: 15.115037763351003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To fix a bug in a program, we need to locate where the bug is, understand why
it causes the problem, and patch the code accordingly. This process becomes
harder when the program is a trained machine learning model and even harder for
opaque deep learning models. In this survey, we review papers that exploit
explanations to enable humans to debug NLP models. We call this problem
explanation-based human debugging (EBHD). In particular, we categorize and
discuss existing works along three main dimensions of EBHD (the bug context,
the workflow, and the experimental setting), compile findings on how EBHD
components affect human debuggers, and highlight open problems that could be
future research directions.
Related papers
- A Proposal for a Debugging Learning Support Environment for Undergraduate Students Majoring in Computer Science [0.0]
Students do not know how to use a debugger or have never used one.
We implemented a function in Scratch that allows for self-learning of correct breakpoint placement.
arXiv Detail & Related papers (2024-07-25T03:34:19Z) - VDebugger: Harnessing Execution Feedback for Debugging Visual Programs [103.61860743476933]
We introduce V Debugger, a critic-refiner framework trained to localize and debug visual programs by tracking execution step by step.
V Debugger identifies and corrects program errors leveraging detailed execution feedback, improving interpretability and accuracy.
Evaluations on six datasets demonstrate V Debugger's effectiveness, showing performance improvements of up to 3.2% in downstream task accuracy.
arXiv Detail & Related papers (2024-06-19T11:09:16Z) - Leveraging Print Debugging to Improve Code Generation in Large Language
Models [63.63160583432348]
Large language models (LLMs) have made significant progress in code generation tasks.
But their performance in tackling programming problems with complex data structures and algorithms remains suboptimal.
We propose an in-context learning approach that guides LLMs to debug by using a "print debug" method.
arXiv Detail & Related papers (2024-01-10T18:37:59Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Language Models are Better Bug Detector Through Code-Pair Classification [0.26107298043931204]
In this paper, we propose code-pair classification task in which both the buggy and non-buggy versions are given to the model, and the model identifies the buggy ones.
Experiments indicate that an LLM can often pick the buggy from the non-buggy version of the code, and the code-pair classification task is much easier compared to be given a snippet and deciding if and where a bug exists.
arXiv Detail & Related papers (2023-11-14T07:20:57Z) - NuzzleBug: Debugging Block-Based Programs in Scratch [11.182625995483862]
NuzzleBug is an extension of the popular block-based programming environment Scratch.
It is an interrogative debugger that enables to ask questions about executions and provides answers.
We find that teachers consider NuzzleBug to be useful, and children can use it to debug faulty programs effectively.
arXiv Detail & Related papers (2023-09-25T18:56:26Z) - Does Correction Remain A Problem For Large Language Models? [63.24433996856764]
This paper investigates the role of correction in the context of large language models by conducting two experiments.
The first experiment focuses on correction as a standalone task, employing few-shot learning techniques with GPT-like models for error correction.
The second experiment explores the notion of correction as a preparatory task for other NLP tasks, examining whether large language models can tolerate and perform adequately on texts containing certain levels of noise or errors.
arXiv Detail & Related papers (2023-08-03T14:09:31Z) - An Effective Data-Driven Approach for Localizing Deep Learning Faults [20.33411443073181]
We propose a novel data-driven approach that leverages model features to learn problem patterns.
Our methodology automatically links bug symptoms to their root causes, without the need for manually crafted mappings.
Our results demonstrate that our technique can effectively detect and diagnose different bug types.
arXiv Detail & Related papers (2023-07-18T03:28:39Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - ADPTriage: Approximate Dynamic Programming for Bug Triage [0.0]
We develop a Markov decision process (MDP) model for an online bug triage task.
We provide an ADP-based bug triage solution, called ADPTriage, which reflects downstream uncertainty in the bug arrivals and developers' timetables.
Our result shows a significant improvement over the myopic approach in terms of assignment accuracy and fixing time.
arXiv Detail & Related papers (2022-11-02T04:42:21Z) - DapStep: Deep Assignee Prediction for Stack Trace Error rePresentation [61.99379022383108]
We propose new deep learning models to solve the bug triage problem.
The models are based on a bidirectional recurrent neural network with attention and on a convolutional neural network.
To improve the quality of ranking, we propose using additional information from version control system annotations.
arXiv Detail & Related papers (2022-01-14T00:16:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.