Simulating Bandit Learning from User Feedback for Extractive Question
Answering
- URL: http://arxiv.org/abs/2203.10079v1
- Date: Fri, 18 Mar 2022 17:47:58 GMT
- Title: Simulating Bandit Learning from User Feedback for Extractive Question
Answering
- Authors: Ge Gao, Eunsol Choi, Yoav Artzi
- Abstract summary: We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
- Score: 51.97943858898579
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study learning from user feedback for extractive question answering by
simulating feedback using supervised data. We cast the problem as contextual
bandit learning, and analyze the characteristics of several learning scenarios
with focus on reducing data annotation. We show that systems initially trained
on a small number of examples can dramatically improve given feedback from
users on model-predicted answers, and that one can use existing datasets to
deploy systems in new domains without any annotation, but instead improving the
system on-the-fly via user feedback.
Related papers
- Exploiting Correlated Auxiliary Feedback in Parameterized Bandits [56.84649080789685]
We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward.
The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback)
arXiv Detail & Related papers (2023-11-05T17:27:06Z) - System-Level Natural Language Feedback [83.24259100437965]
We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process.
We conduct two case studies of this approach for improving search query and dialog response generation.
We show the combination of system-level and instance-level feedback brings further gains.
arXiv Detail & Related papers (2023-06-23T16:21:40Z) - Continually Improving Extractive QA via Human Feedback [59.49549491725224]
We study continually improving an extractive question answering (QA) system via human user feedback.
We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time.
arXiv Detail & Related papers (2023-05-21T14:35:32Z) - Learning from a Learning User for Optimal Recommendations [43.2268992294178]
We formalize a model to capture "learning users" and design an efficient system-side learning solution.
We prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse.
Our study provides a novel perspective on modeling the feedback loop in recommendation problems.
arXiv Detail & Related papers (2022-02-03T22:45:12Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - Reinforcement Learning with Feedback Graphs [69.1524391595912]
We study episodic reinforcement learning in decision processes when the agent receives additional feedback per step.
We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage the additional feedback for more sample-efficient learning.
arXiv Detail & Related papers (2020-05-07T22:35:37Z) - Pattern Learning for Detecting Defect Reports and Improvement Requests
in App Reviews [4.460358746823561]
In this study, we follow novel approaches that target this absence of actionable insights by classifying reviews as defect reports and requests for improvement.
We employ a supervised system that is capable of learning lexico-semantic patterns through genetic programming.
We show that the automatically learned patterns outperform the manually created ones, to be generated.
arXiv Detail & Related papers (2020-04-19T08:13:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.