World Value Functions: Knowledge Representation for Learning and
Planning
- URL: http://arxiv.org/abs/2206.11940v1
- Date: Thu, 23 Jun 2022 18:49:54 GMT
- Title: World Value Functions: Knowledge Representation for Learning and
Planning
- Authors: Geraud Nangue Tasse, Benjamin Rosman, Steven James
- Abstract summary: We propose world value functions (WVFs), a type of goal-oriented general value function.
WVFs represent how to solve not just a given task, but any other goal-reaching task in an agent's environment.
We show that WVFs can be learned faster than regular value functions, while their ability to infer the environment's dynamics can be used to integrate learning and planning methods.
- Score: 14.731788603429774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose world value functions (WVFs), a type of goal-oriented general
value function that represents how to solve not just a given task, but any
other goal-reaching task in an agent's environment. This is achieved by
equipping an agent with an internal goal space defined as all the world states
where it experiences a terminal transition. The agent can then modify the
standard task rewards to define its own reward function, which provably drives
it to learn how to achieve all reachable internal goals, and the value of doing
so in the current task. We demonstrate two key benefits of WVFs in the context
of learning and planning. In particular, given a learned WVF, an agent can
compute the optimal policy in a new task by simply estimating the task's reward
function. Furthermore, we show that WVFs also implicitly encode the transition
dynamics of the environment, and so can be used to perform planning.
Experimental results show that WVFs can be learned faster than regular value
functions, while their ability to infer the environment's dynamics can be used
to integrate learning and planning methods to further improve sample
efficiency.
Related papers
- A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards [29.923942622540356]
We introduce Iterative Keypoint Reward (IKER), a Python-based reward function that serves as a dynamic task specification.
We reconstruct real-world scenes in simulation and use the generated rewards to train reinforcement learning policies.
The results highlight IKER's effectiveness in enabling robots to perform multi-step tasks in dynamic environments.
arXiv Detail & Related papers (2025-02-12T18:57:22Z) - VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks [100.3234156027118]
We present VLABench, an open-source benchmark for evaluating universal LCM task learning.
VLABench provides 100 carefully designed categories of tasks, with strong randomization in each category of task and a total of 2000+ objects.
The benchmark assesses multiple competencies including understanding of mesh&texture, spatial relationship, semantic instruction, physical laws, knowledge transfer and reasoning.
arXiv Detail & Related papers (2024-12-24T06:03:42Z) - Vision Language Models are In-Context Value Learners [89.29486557646624]
We present Generative Value Learning (GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress.
Without any robot or task specific training, GVL can in-context zero-shot and few-shot predict effective values for more than 300 distinct real-world tasks.
arXiv Detail & Related papers (2024-11-07T09:17:50Z) - Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Task Aware Dreamer for Task Generalization in Reinforcement Learning [31.364276322513447]
We show that training a general world model can utilize similar structures in tasks and help train more generalizable agents.
We introduce a novel method named Task Aware Dreamer (TAD), which integrates reward-informed features to identify latent characteristics across tasks.
Experiments in both image-based and state-based tasks show that TAD can significantly improve the performance of handling different tasks simultaneously.
arXiv Detail & Related papers (2023-03-09T08:04:16Z) - Toward Efficient Automated Feature Engineering [27.47868891738917]
Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks.
Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment.
We construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation.
We conduct comprehensive experiments on 36 datasets in terms of both classification and regression tasks.
arXiv Detail & Related papers (2022-12-26T13:18:51Z) - Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation.
We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional.
We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z) - World Value Functions: Knowledge Representation for Multitask
Reinforcement Learning [14.731788603429774]
We propose world value functions (WVFs), which are a type of general value function with mastery of the world.
We equip the agent with an internal goal space defined as all the world states where it experiences a terminal transition.
We show that for tasks in the same world, a pretrained agent that has learned any WVF can then infer the policy and value function for any new task directly from its rewards.
arXiv Detail & Related papers (2022-05-18T09:45:14Z) - Universal Successor Features for Transfer Reinforcement Learning [77.27304854836645]
We propose Universal Successor Features (USFs) to capture the underlying dynamics of the environment.
We show that USFs is compatible with any RL algorithm that learns state values using a temporal difference method.
arXiv Detail & Related papers (2020-01-05T03:41:06Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.