Interaction-Grounded Learning with Action-inclusive Feedback
- URL: http://arxiv.org/abs/2206.08364v1
- Date: Thu, 16 Jun 2022 17:59:10 GMT
- Title: Interaction-Grounded Learning with Action-inclusive Feedback
- Authors: Tengyang Xie, Akanksha Saran, Dylan J. Foster, Lekan Molu, Ida
Momennejad, Nan Jiang, Paul Mineiro, John Langford
- Abstract summary: We create an algorithm and analysis which allows IGL to work even when the feedback vector contains the action, encoded in any fashion.
We provide theoretical guarantees and large-scale experiments based on supervised datasets to demonstrate the effectiveness of the new approach.
- Score: 46.29513917377202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider the problem setting of Interaction-Grounded Learning (IGL), in which
a learner's goal is to optimally interact with the environment with no explicit
reward to ground its policies. The agent observes a context vector, takes an
action, and receives a feedback vector, using this information to effectively
optimize a policy with respect to a latent reward function. Prior analyzed
approaches fail when the feedback vector contains the action, which
significantly limits IGL's success in many potential scenarios such as
Brain-computer interface (BCI) or Human-computer interface (HCI) applications.
We address this by creating an algorithm and analysis which allows IGL to work
even when the feedback vector contains the action, encoded in any fashion. We
provide theoretical guarantees and large-scale experiments based on supervised
datasets to demonstrate the effectiveness of the new approach.
Related papers
- Enhancing Graph Contrastive Learning with Reliable and Informative Augmentation for Recommendation [84.45144851024257]
CoGCL aims to enhance graph contrastive learning by constructing contrastive views with stronger collaborative information via discrete codes.
We introduce a multi-level vector quantizer in an end-to-end manner to quantize user and item representations into discrete codes.
For neighborhood structure, we propose virtual neighbor augmentation by treating discrete codes as virtual neighbors.
Regarding semantic relevance, we identify similar users/items based on shared discrete codes and interaction targets to generate the semantically relevant view.
arXiv Detail & Related papers (2024-09-09T14:04:17Z) - Provably Efficient Interactive-Grounded Learning with Personalized Reward [44.64476717773815]
Interactive-Grounded Learning (IGL) is a powerful framework in which a learner aims at maximizing unobservable rewards.
We provide the first provably efficient algorithms with sublinear regret under realizability.
We propose two algorithms, one based on explore-then-exploit and the other based on inverse-gap weighting.
arXiv Detail & Related papers (2024-05-31T08:21:09Z) - Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning [79.38140606606126]
We propose an algorithmic framework that fine-tunes vision-language models (VLMs) with reinforcement learning (RL)
Our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning.
We demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks.
arXiv Detail & Related papers (2024-05-16T17:50:19Z) - Learning Action-Effect Dynamics for Hypothetical Vision-Language
Reasoning Task [50.72283841720014]
We propose a novel learning strategy that can improve reasoning about the effects of actions.
We demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability.
arXiv Detail & Related papers (2022-12-07T05:41:58Z) - RLIP: Relational Language-Image Pre-training for Human-Object
Interaction Detection [32.20132357830726]
Language-Image Pre-training (LIPR) is a strategy for contrastive pre-training that leverages both entity and relation descriptions.
We show the benefits of these contributions, collectively termed RLIP-ParSe, for improved zero-shot, few-shot and fine-tuning HOI detection as well as increased robustness from noisy annotations.
arXiv Detail & Related papers (2022-09-05T07:50:54Z) - Interaction-Grounded Learning [24.472306647094253]
We propose Interaction-Grounded Learning, in which a learner's goal is to interact with the environment with no grounding or explicit reward to optimize its policies.
We show that in an Interaction-Grounded Learning setting, with certain natural assumptions, a learner can discover the latent reward and ground its policy for successful interaction.
arXiv Detail & Related papers (2021-06-09T08:13:29Z) - Mining Implicit Entity Preference from User-Item Interaction Data for
Knowledge Graph Completion via Adversarial Learning [82.46332224556257]
We propose a novel adversarial learning approach by leveraging user interaction data for the Knowledge Graph Completion task.
Our generator is isolated from user interaction data, and serves to improve the performance of the discriminator.
To discover implicit entity preference of users, we design an elaborate collaborative learning algorithms based on graph neural networks.
arXiv Detail & Related papers (2020-03-28T05:47:33Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.