Related papers: Simulating Bandit Learning from User Feedback for Extractive Question Answering

Simulating Bandit Learning from User Feedback for Extractive Question Answering

URL: http://arxiv.org/abs/2203.10079v1
Date: Fri, 18 Mar 2022 17:47:58 GMT
Title: Simulating Bandit Learning from User Feedback for Extractive Question Answering
Authors: Ge Gao, Eunsol Choi, Yoav Artzi
Abstract summary: We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
Score: 51.97943858898579
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We study learning from user feedback for extractive question answering by simulating feedback using supervised data. We cast the problem as contextual bandit learning, and analyze the characteristics of several learning scenarios with focus on reducing data annotation. We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers, and that one can use existing datasets to deploy systems in new domains without any annotation, but instead improving the system on-the-fly via user feedback.

Related papers

Adaptive Querying for Reward Learning from Human Feedback [5.587293092389789]
Learning from human feedback is a popular approach to train robots to adapt to user preferences and improve safety. We examine how to learn a penalty function associated with unsafe behaviors, such as side effects, using multiple forms of human feedback. We employ an iterative, two-phase approach which first selects critical states for querying, and then uses information gain to select a feedback format for querying.
arXiv Detail & Related papers (2024-12-11T00:02:48Z)
Exploiting Correlated Auxiliary Feedback in Parameterized Bandits [56.84649080789685]
We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward. The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback)
arXiv Detail & Related papers (2023-11-05T17:27:06Z)
System-Level Natural Language Feedback [83.24259100437965]
We show how to use feedback to formalize system-level design decisions in a human-in-the-loop-process. We conduct two case studies of this approach for improving search query and dialog response generation. We show the combination of system-level and instance-level feedback brings further gains.
arXiv Detail & Related papers (2023-06-23T16:21:40Z)
Continually Improving Extractive QA via Human Feedback [59.49549491725224]
We study continually improving an extractive question answering (QA) system via human user feedback. We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time.
arXiv Detail & Related papers (2023-05-21T14:35:32Z)
Learning from a Learning User for Optimal Recommendations [43.2268992294178]
We formalize a model to capture "learning users" and design an efficient system-side learning solution. We prove that the regret of RAES deteriorates gracefully as the convergence rate of user learning becomes worse. Our study provides a novel perspective on modeling the feedback loop in recommendation problems.
arXiv Detail & Related papers (2022-02-03T22:45:12Z)
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
Reinforcement Learning with Feedback Graphs [69.1524391595912]
We study episodic reinforcement learning in decision processes when the agent receives additional feedback per step. We formalize this setting using a feedback graph over state-action pairs and show that model-based algorithms can leverage the additional feedback for more sample-efficient learning.
arXiv Detail & Related papers (2020-05-07T22:35:37Z)
Pattern Learning for Detecting Defect Reports and Improvement Requests in App Reviews [4.460358746823561]
In this study, we follow novel approaches that target this absence of actionable insights by classifying reviews as defect reports and requests for improvement. We employ a supervised system that is capable of learning lexico-semantic patterns through genetic programming. We show that the automatically learned patterns outperform the manually created ones, to be generated.
arXiv Detail & Related papers (2020-04-19T08:13:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.