Towards Teachable Reasoning Systems
- URL: http://arxiv.org/abs/2204.13074v1
- Date: Wed, 27 Apr 2022 17:15:07 GMT
- Title: Towards Teachable Reasoning Systems
- Authors: Bhavana Dalvi, Oyvind Tafjord, Peter Clark
- Abstract summary: We develop a teachable reasoning system for question-answering (QA)
Our approach is three-fold: First, generated chains of reasoning show how answers are implied by the system's own internal beliefs.
Second, users can interact with the explanations to identify erroneous model beliefs and provide corrections.
Third, we augment the model with a dynamic memory of such corrections.
- Score: 29.59387051046722
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our goal is a teachable reasoning system for question-answering (QA), where a
user can interact with faithful answer explanations, and correct errors so that
the system improves over time. Our approach is three-fold: First, generated
chains of reasoning show how answers are implied by the system's own internal
beliefs. Second, users can interact with the explanations to identify erroneous
model beliefs and provide corrections. Third, we augment the model with a
dynamic memory of such corrections. Retrievals from memory are used as
additional context for QA, to help avoid previous mistakes in similar new
situations - a novel type of memory-based continuous learning. To our
knowledge, this is the first system to generate chains that are both faithful
(the answer follows from the reasoning) and truthful (the chain reflects the
system's own beliefs, as ascertained by self-querying). In evaluation, users
judge that a majority (65%+) of generated chains clearly show how an answer
follows from a set of facts - substantially better than a high-performance
baseline. We also find that using simulated feedback, our system (called
EntailmentWriter) continually improves with time, requiring feedback on only
25% of training examples to reach within 1% of the upper-bound (feedback on all
examples). We observe a similar trend with real users. This suggests new
opportunities for using language models in an interactive setting where users
can inspect, debug, correct, and improve a system's performance over time.
Related papers
- What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User Perception [53.4840989321394]
We analyze the effect of rationales generated by QA models to support their answers.
We present users with incorrect answers and corresponding rationales in various formats.
We measure the effectiveness of this feedback in patching these rationales through in-context learning.
arXiv Detail & Related papers (2023-11-16T04:26:32Z) - Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations.
We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z) - ReCEval: Evaluating Reasoning Chains via Correctness and Informativeness [67.49087159888298]
ReCEval is a framework that evaluates reasoning chains via two key properties: correctness and informativeness.
We show that ReCEval effectively identifies various error types and yields notable improvements compared to prior methods.
arXiv Detail & Related papers (2023-04-21T02:19:06Z) - Entailer: Answering Questions with Faithful and Truthful Chains of
Reasoning [26.715242799194908]
We show how a question-answering system can show how its answers are implied by its own internal beliefs via a systematic chain of reasoning.
Our approach is to combine a trained backward-chaining model, capable of generating a set of premises entailing an answer hypothesis, with a verifier that checks that the model itself believes those premises.
To our knowledge, this is the first system to generate multistep chains that are both faithful (the answer follows from the reasoning) and truthful (the chain reflects the system's own internal beliefs)
arXiv Detail & Related papers (2022-10-21T19:51:56Z) - Using Interactive Feedback to Improve the Accuracy and Explainability of
Question Answering Systems Post-Deployment [20.601284299825895]
We focus on two kinds of improvements: 1) improving the QA system's performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.
We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users.
We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems.
arXiv Detail & Related papers (2022-04-06T18:17:09Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - Improving scripts with a memory of natural feedback [38.81097942561449]
We create a dynamic memory architecture with a growing memory of feedbacks about errors in the output.
On a script generation task, we show empirically that the model learns to apply feedback effectively.
This is a first step towards strengthening deployed models, potentially broadening their utility.
arXiv Detail & Related papers (2021-12-16T07:01:28Z) - Deep Feedback Inverse Problem Solver [141.26041463617963]
We present an efficient, effective, and generic approach towards solving inverse problems.
We leverage the feedback signal provided by the forward process and learn an iterative update model.
Our approach does not have any restrictions on the forward process; it does not require any prior knowledge either.
arXiv Detail & Related papers (2021-01-19T16:49:06Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - F1 is Not Enough! Models and Evaluation Towards User-Centered
Explainable Question Answering [30.95495958937006]
We show that current models and evaluation settings have shortcomings regarding the coupling of answer and explanation.
We propose a hierarchical model and a new regularization term to strengthen the answer-explanation coupling.
Our scores are better aligned with user experience, making them promising candidates for model selection.
arXiv Detail & Related papers (2020-10-13T10:53:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.