Online Semi-Supervised Learning in Contextual Bandits with Episodic
Reward
- URL: http://arxiv.org/abs/2009.08457v2
- Date: Sun, 25 Oct 2020 03:29:56 GMT
- Title: Online Semi-Supervised Learning in Contextual Bandits with Episodic
Reward
- Authors: Baihan Lin
- Abstract summary: We introduce Background Episodic Reward LinUCB (UCB), a solution that easily incorporates clustering as a self-supervision module.
Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach.
- Score: 13.173307471333619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We considered a novel practical problem of online learning with episodically
revealed rewards, motivated by several real-world applications, where the
contexts are nonstationary over different episodes and the reward feedbacks are
not always available to the decision making agents. For this online
semi-supervised learning setting, we introduced Background Episodic Reward
LinUCB (BerlinUCB), a solution that easily incorporates clustering as a
self-supervision module to provide useful side information when rewards are not
observed. Our experiments on a variety of datasets, both in stationary and
nonstationary environments of six different scenarios, demonstrated clear
advantages of the proposed approach over the standard contextual bandit.
Lastly, we introduced a relevant real-life example where this problem setting
is especially useful.
Related papers
- Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms [23.61332577985059]
Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior.
This paper introduces a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting.
arXiv Detail & Related papers (2024-02-23T15:49:46Z) - Random Representations Outperform Online Continually Learned Representations [68.42776779425978]
We show that existing online continually trained deep networks produce inferior representations compared to a simple pre-defined random transforms.
Our method, called RanDumb, significantly outperforms state-of-the-art continually learned representations across all online continual learning benchmarks.
Our study reveals the significant limitations of representation learning, particularly in low-exemplar and online continual learning scenarios.
arXiv Detail & Related papers (2024-02-13T22:07:29Z) - Point Contrastive Prediction with Semantic Clustering for
Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data.
Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z) - On Covariate Shift of Latent Confounders in Imitation and Reinforcement
Learning [69.48387059607387]
We consider the problem of using expert data with unobserved confounders for imitation and reinforcement learning.
We analyze the limitations of learning from confounded expert data with and without external reward.
We validate our claims empirically on challenging assistive healthcare and recommender system simulation tasks.
arXiv Detail & Related papers (2021-10-13T07:31:31Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Online Semi-Supervised Learning with Bandit Feedback [45.899239661737795]
We formulate a new problem at the intersectionof semi-supervised learning and contextual bandits.
We demonstratehow Graph Convolutional Network (GCN), a semi-supervised learning approach, can be adjusted tothe new problem formulation.
arXiv Detail & Related papers (2020-10-23T17:56:38Z) - Few-Shot Unsupervised Continual Learning through Meta-Examples [21.954394608030388]
We introduce a novel and complex setting involving unsupervised meta-continual learning with unbalanced tasks.
We exploit a meta-learning scheme that simultaneously alleviates catastrophic forgetting and favors the generalization to new tasks.
Experimental results on few-shot learning benchmarks show competitive performance even compared to the supervised case.
arXiv Detail & Related papers (2020-09-17T07:02:07Z) - Contextual Bandit with Missing Rewards [27.066965426355257]
We consider a novel variant of the contextual bandit problem where the reward associated with each context-based decision may not always be observed.
This new problem is motivated by certain online settings including clinical trial and ad recommendation applications.
We propose to combine the standard contextual bandit approach with an unsupervised learning mechanism such as clustering.
arXiv Detail & Related papers (2020-07-13T13:29:51Z) - Wandering Within a World: Online Contextualized Few-Shot Learning [62.28521610606054]
We aim to bridge the gap between typical human and machine-learning environments by extending the standard framework of few-shot learning to an online setting.
We propose a new prototypical few-shot learning based on large scale indoor imagery that mimics the visual experience of an agent wandering within a world.
arXiv Detail & Related papers (2020-07-09T04:05:04Z) - Dark Experience for General Continual Learning: a Strong, Simple
Baseline [18.389103500859804]
We work towards General Continual Learning (GCL), where task boundaries blur and the domain and class distributions shift either gradually or suddenly.
We address it through mixing rehearsal with knowledge distillation and regularization; our simple baseline, Dark Experience Replay, matches the network's logits sampled throughout the optimization trajectory.
By conducting an extensive analysis on both standard benchmarks and a novel GCL evaluation setting (MNIST-360), we show that such a seemingly simple baseline outperforms consolidated approaches.
arXiv Detail & Related papers (2020-04-15T17:13:05Z) - Weakly-Supervised Multi-Level Attentional Reconstruction Network for
Grounding Textual Queries in Videos [73.4504252917816]
The task of temporally grounding textual queries in videos is to localize one video segment that semantically corresponds to the given query.
Most of the existing approaches rely on segment-sentence pairs (temporal annotations) for training, which are usually unavailable in real-world scenarios.
We present an effective weakly-supervised model, named as Multi-Level Attentional Reconstruction Network (MARN), which only relies on video-sentence pairs during the training stage.
arXiv Detail & Related papers (2020-03-16T07:01:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.