Interaction-Grounded Learning
- URL: http://arxiv.org/abs/2106.04887v1
- Date: Wed, 9 Jun 2021 08:13:29 GMT
- Title: Interaction-Grounded Learning
- Authors: Tengyang Xie, John Langford, Paul Mineiro, Ida Momennejad
- Abstract summary: We propose Interaction-Grounded Learning, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.
We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction.
- Score: 24.472306647094253
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider a prosthetic arm, learning to adapt to its user's control signals.
We propose Interaction-Grounded Learning for this novel setting, in which a
learner's goal is to interact with the environment with no grounding or
explicit reward to optimize its policies. Such a problem evades common RL
solutions which require an explicit reward. The learning agent observes a
multidimensional context vector, takes an action, and then observes a
multidimensional feedback vector. This multidimensional feedback vector has no
explicit reward information. In order to succeed, the algorithm must learn how
to evaluate the feedback vector to discover a latent reward signal, with which
it can ground its policies without supervision. We show that in an
Interaction-Grounded Learning setting, with certain natural assumptions, a
learner can discover the latent reward and ground its policy for successful
interaction. We provide theoretical guarantees and a proof-of-concept empirical
evaluation to demonstrate the effectiveness of our proposed approach.
Related papers
- Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching [23.600285251963395]
In inverse reinforcement learning (IRL), an agent seeks to replicate expert demonstrations through interactions with the environment.
Traditionally, IRL is treated as an adversarial game, where an adversary searches over reward models, and a learner optimize the reward through repeated RL procedures.
We propose a novel approach to IRL by direct policy optimization, exploiting a linear factorization of the return as the inner product of successor features and a reward vector.
arXiv Detail & Related papers (2024-11-11T14:05:50Z) - Online Learning with Off-Policy Feedback [18.861989132159945]
We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback.
We propose a set of algorithms that guarantee regret bounds that scale with a natural notion of mismatch between any comparator policy and the behavior policy.
arXiv Detail & Related papers (2022-07-18T21:57:16Z) - Interaction-Grounded Learning with Action-inclusive Feedback [46.29513917377202]
We create an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion.
We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.
arXiv Detail & Related papers (2022-06-16T17:59:10Z) - Backprop-Free Reinforcement Learning with Active Neural Generative
Coding [84.11376568625353]
We propose a computational framework for learning action-driven generative models without backpropagation of errors (backprop) in dynamic environments.
We develop an intelligent agent that operates even with sparse rewards, drawing inspiration from the cognitive theory of planning as inference.
The robust performance of our agent offers promising evidence that a backprop-free approach for neural inference and learning can drive goal-directed behavior.
arXiv Detail & Related papers (2021-07-10T19:02:27Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Generative Inverse Deep Reinforcement Learning for Online Recommendation [62.09946317831129]
We propose a novel inverse reinforcement learning approach, namely InvRec, for online recommendation.
InvRec extracts the reward function from user's behaviors automatically, for online recommendation.
arXiv Detail & Related papers (2020-11-04T12:12:25Z) - Knowledge-guided Deep Reinforcement Learning for Interactive
Recommendation [49.32287384774351]
Interactive recommendation aims to learn from dynamic interactions between items and users to achieve responsiveness and accuracy.
We propose Knowledge-Guided deep Reinforcement learning to harness the advantages of both reinforcement learning and knowledge graphs for interactive recommendation.
arXiv Detail & Related papers (2020-04-17T05:26:47Z) - Value Driven Representation for Human-in-the-Loop Reinforcement Learning [33.79501890330252]
We focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent.
We present an algorithm, value driven representation (VDR) that can iteratively and adaptively augment the observation space of a reinforcement learning agent.
We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.
arXiv Detail & Related papers (2020-04-02T18:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.