Related papers: Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering

Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering

URL: http://arxiv.org/abs/2306.06779v1
Date: Sun, 11 Jun 2023 21:18:50 GMT
Title: Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering
Authors: Hai Ye, Qizhe Xie, Hwee Tou Ng
Abstract summary: We study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-armed bandit learning, the dueling framework allows pairwise collaboration among K models, which is solved by a novel method named Co-UCB proposed in this work.
Score: 25.44581667865143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we study multi-source test-time model adaptation from user feedback, where K distinct models are established for adaptation. To allow efficient adaptation, we cast the problem as a stochastic decision-making process, aiming to determine the best adapted model after adaptation. We discuss two frameworks: multi-armed bandit learning and multi-armed dueling bandits. Compared to multi-armed bandit learning, the dueling framework allows pairwise collaboration among K models, which is solved by a novel method named Co-UCB proposed in this work. Experiments on six datasets of extractive question answering (QA) show that the dueling framework using Co-UCB is more effective than other strong baselines for our studied problem.

Related papers

Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond [58.39457881271146]
We introduce a novel framework of multi-armed bandits (CMAB) with multivariant and probabilistically triggering arms (CMAB-MT) Compared with existing CMAB works, CMAB-MT not only enhances the modeling power but also allows improved results by leveraging distinct statistical properties for multivariant random variables. Our framework can include many important problems as applications, such as episodic reinforcement learning (RL) and probabilistic maximum coverage for goods distribution.
arXiv Detail & Related papers (2024-06-03T14:48:53Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
A Bandit Approach with Evolutionary Operators for Model Selection [0.4604003661048266]
This work formulates model selection as an infinite-armed bandit problem, namely, a problem in which a decision maker iteratively selects one of an infinite number of fixed choices (i.e., arms) The arms are machine learning models to train and selecting an arm corresponds to a partial training of the model (resource allocation) We propose the algorithm Mutant-UCB that incorporates operators from evolutionary algorithms into the UCB-E bandit algorithm introduced by Audiber et al. Tests carried out on three open source image classification data sets attest to the relevance of this novel combining approach, which outperforms the state-of
arXiv Detail & Related papers (2024-02-07T08:01:45Z)
Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints [21.109631268204215]
We propose a novel master-slave architecture to solve the top-$K$ multi-armed bandits problem. To the best of our knowledge, it is the first bandits setting considering diversity constraints under bandit feedback.
arXiv Detail & Related papers (2023-08-24T09:39:04Z)
On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts. We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z)
MetaQA: Combining Expert Agents for Multi-Skill Question Answering [49.35261724460689]
We argue that despite the promising results of multi-dataset models, some domains or QA formats might require specific architectures. We propose to combine expert agents with a novel, flexible, and training-efficient architecture that considers questions, answer predictions, and answer-prediction confidence scores.
arXiv Detail & Related papers (2021-12-03T14:05:52Z)
Statistical Consequences of Dueling Bandits [0.0]
Multi-Armed-Bandit frameworks have often been used to assess educational interventions. Recent work has shown that it is more beneficial for a student to provide qualitative feedback through preference elicitation. We compare traditional uniform sampling to a dueling bandit algorithm and find that dueling bandit algorithms perform well at cumulative regret minimisation, but lead to inflated Type-I error rates and reduced power under certain circumstances.
arXiv Detail & Related papers (2021-10-16T23:48:43Z)
Exploiting Transitivity for Top-k Selection with Score-Based Dueling Bandits [0.0]
We consider the problem of top-k subset selection in Dueling Bandit problems with score information. We propose a Thurstonian style model and adapt the Pairwise Optimal Computing Budget Allocation for subset selection (POCBAm) sampling method.
arXiv Detail & Related papers (2020-12-31T14:54:25Z)
Learning to Recover Reasoning Chains for Multi-Hop Question Answering via Cooperative Games [66.98855910291292]
We propose a new problem of learning to recover reasoning chains from weakly supervised signals. How the evidence passages are selected and how the selected passages are connected are handled by two models. For evaluation, we created benchmarks based on two multi-hop QA datasets.
arXiv Detail & Related papers (2020-04-06T03:54:38Z)
DUMA: Reading Comprehension with Transposition Thinking [107.89721765056281]
Multi-choice Machine Reading (MRC) requires model to decide the correct answer from a set of answer options when given a passage and a question. New DUal Multi-head Co-Attention (DUMA) model is inspired by human's transposition thinking process solving the multi-choice MRC problem.
arXiv Detail & Related papers (2020-01-26T07:35:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.