Related papers: Learning Value Functions from Undirected State-only Experience

Learning Value Functions from Undirected State-only Experience

URL: http://arxiv.org/abs/2204.12458v1
Date: Tue, 26 Apr 2022 17:24:36 GMT
Title: Learning Value Functions from Undirected State-only Experience
Authors: Matthew Chang, Arjun Gupta, Saurabh Gupta
Abstract summary: We show that Markov Qlearning in discrete decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions.
Score: 17.76847333440422
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper tackles the problem of learning value functions from undirected state-only experience (state transitions without action labels i.e. (s,s',r) tuples). We first theoretically characterize the applicability of Q-learning in this setting. We show that tabular Q-learning in discrete Markov decision processes (MDPs) learns the same value function under any arbitrary refinement of the action space. This theoretical result motivates the design of Latent Action Q-learning or LAQ, an offline RL method that can learn effective value functions from state-only experience. Latent Action Q-learning (LAQ) learns value functions using Q-learning on discrete latent actions obtained through a latent-variable future prediction model. We show that LAQ can recover value functions that have high correlation with value functions learned using ground truth actions. Value functions learned using LAQ lead to sample efficient acquisition of goal-directed behavior, can be used with domain-specific low-level controllers, and facilitate transfer across embodiments. Our experiments in 5 environments ranging from 2D grid world to 3D visual navigation in realistic environments demonstrate the benefits of LAQ over simpler alternatives, imitation learning oracles, and competing methods.

Related papers

Reward Adaptation Via Q-Manipulation [3.8065968624597324]
We propose a new solution to reward adaptation (RA), the problem where the learning agent adapts to a target reward function based on one or multiple existing behaviors. Our work represents a new approach to RA via the manipulation of Q-functions. We refer to such a method as Q-Manipulation (Q-M)
arXiv Detail & Related papers (2025-03-17T17:42:54Z)
Towards Plastic and Stable Exemplar-Free Incremental Learning: A Dual-Learner Framework with Cumulative Parameter Averaging [12.168402195820649]
We propose a Dual-Learner framework with Cumulative. Averaging (DLCPA) We show that DLCPA outperforms several state-of-the-art exemplar-free baselines in both Task-IL and Class-IL settings.
arXiv Detail & Related papers (2023-10-28T08:48:44Z)
Learning Reward for Physical Skills using Large Language Model [5.795405764196473]
Large Language Models contain valuable task-related knowledge that can aid in learning reward functions. We aim to extract task knowledge from LLMs using environment feedback to create efficient reward functions for physical skills.
arXiv Detail & Related papers (2023-10-21T19:10:06Z)
Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function. Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z)
VA-learning as a more efficient alternative to Q-learning [49.526579981437315]
We introduce VA-learning, which directly learns advantage function and value function using bootstrapping. VA-learning learns off-policy and enjoys similar theoretical guarantees as Q-learning. Thanks to the direct learning of advantage function and value function, VA-learning improves the sample efficiency over Q-learning.
arXiv Detail & Related papers (2023-05-29T15:44:47Z)
Reinforcement Learning from Passive Data via Latent Intentions [86.4969514480008]
We show that passive data can still be used to learn features that accelerate downstream RL. Our approach learns from passive data by modeling intentions. Our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.
arXiv Detail & Related papers (2023-04-10T17:59:05Z)
Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z)
Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs [50.75812033462294]
We bridge the gap between practical success of Q-learning and pessimistic theoretical results. We present novel methods Q-Rex and Q-RexDaRe. We show that Q-Rex efficiently finds the optimal policy for linear MDPs.
arXiv Detail & Related papers (2021-10-16T01:47:41Z)
Visual Transformer for Task-aware Active Learning [49.903358393660724]
We present a novel pipeline for pool-based Active Learning. Our method exploits accessible unlabelled examples during training to estimate their co-relation with the labelled examples. Visual Transformer models non-local visual concept dependency between labelled and unlabelled examples.
arXiv Detail & Related papers (2021-06-07T17:13:59Z)
Pre-trained Word Embeddings for Goal-conditional Transfer Learning in Reinforcement Learning [0.0]
We show how a pre-trained task-independent language model can make a goal-conditional RL agent more sample efficient. We do this by facilitating transfer learning between different related tasks.
arXiv Detail & Related papers (2020-07-10T06:42:00Z)
Transfer Reinforcement Learning under Unobserved Contextual Information [16.895704973433382]
We study a transfer reinforcement learning problem where the state transitions and rewards are affected by the environmental context. We develop a method to obtain causal bounds on the transition and reward functions using the demonstrator's data. We propose new Q learning and UCB-Q learning algorithms that converge to the true value function without bias.
arXiv Detail & Related papers (2020-03-09T22:00:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.