Related papers: World Value Functions: Knowledge Representation for Learning and Planning

World Value Functions: Knowledge Representation for Learning and Planning

URL: http://arxiv.org/abs/2206.11940v1
Date: Thu, 23 Jun 2022 18:49:54 GMT
Title: World Value Functions: Knowledge Representation for Learning and Planning
Authors: Geraud Nangue Tasse, Benjamin Rosman, Steven James
Abstract summary: We propose world value functions (WVFs), a type of goal-oriented general value function. WVFs represent how to solve not just a given task, but any other goal-reaching task in an agent's environment. We show that WVFs can be learned faster than regular value functions, while their ability to infer the environment's dynamics can be used to integrate learning and planning methods.
Score: 14.731788603429774
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose world value functions (WVFs), a type of goal-oriented general value function that represents how to solve not just a given task, but any other goal-reaching task in an agent's environment. This is achieved by equipping an agent with an internal goal space defined as all the world states where it experiences a terminal transition. The agent can then modify the standard task rewards to define its own reward function, which provably drives it to learn how to achieve all reachable internal goals, and the value of doing so in the current task. We demonstrate two key benefits of WVFs in the context of learning and planning. In particular, given a learned WVF, an agent can compute the optimal policy in a new task by simply estimating the task's reward function. Furthermore, we show that WVFs also implicitly encode the transition dynamics of the environment, and so can be used to perform planning. Experimental results show that WVFs can be learned faster than regular value functions, while their ability to infer the environment's dynamics can be used to integrate learning and planning methods to further improve sample efficiency.

Related papers

A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards [29.923942622540356]
We introduce Iterative Keypoint Reward (IKER), a Python-based reward function that serves as a dynamic task specification. We reconstruct real-world scenes in simulation and use the generated rewards to train reinforcement learning policies. The results highlight IKER's effectiveness in enabling robots to perform multi-step tasks in dynamic environments.
arXiv Detail & Related papers (2025-02-12T18:57:22Z)
VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks [100.3234156027118]
We present VLABench, an open-source benchmark for evaluating universal LCM task learning. VLABench provides 100 carefully designed categories of tasks, with strong randomization in each category of task and a total of 2000+ objects. The benchmark assesses multiple competencies including understanding of mesh&texture, spatial relationship, semantic instruction, physical laws, knowledge transfer and reasoning.
arXiv Detail & Related papers (2024-12-24T06:03:42Z)
Vision Language Models are In-Context Value Learners [89.29486557646624]
We present Generative Value Learning (GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress. Without any robot or task specific training, GVL can in-context zero-shot and few-shot predict effective values for more than 300 distinct real-world tasks.
arXiv Detail & Related papers (2024-11-07T09:17:50Z)
Localizing Active Objects from Egocentric Vision with Symbolic World Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually. We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions. We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z)
Task Aware Dreamer for Task Generalization in Reinforcement Learning [32.93706056123124]
We show that training a general world model can utilize similar structures in tasks and help train more generalizable agents. We introduce a novel method named Task Aware Dreamer (TAD), which integrates reward-informed features to identify latent characteristics across tasks. Experiments in both image-based and state-based tasks show that TAD can significantly improve the performance of handling different tasks simultaneously.
arXiv Detail & Related papers (2023-03-09T08:04:16Z)
Toward Efficient Automated Feature Engineering [27.47868891738917]
Automated Feature Engineering (AFE) refers to automatically generate and select optimal feature sets for downstream tasks. Current AFE methods mainly focus on improving the effectiveness of the produced features, but ignoring the low-efficiency issue for large-scale deployment. We construct the AFE pipeline based on reinforcement learning setting, where each feature is assigned an agent to perform feature transformation. We conduct comprehensive experiments on 36 datasets in terms of both classification and regression tasks.
arXiv Detail & Related papers (2022-12-26T13:18:51Z)
Goal-Conditioned Q-Learning as Knowledge Distillation [136.79415677706612]
We explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals.
arXiv Detail & Related papers (2022-08-28T22:01:10Z)
World Value Functions: Knowledge Representation for Multitask Reinforcement Learning [14.731788603429774]
We propose world value functions (WVFs), which are a type of general value function with mastery of the world. We equip the agent with an internal goal space defined as all the world states where it experiences a terminal transition. We show that for tasks in the same world, a pretrained agent that has learned any WVF can then infer the policy and value function for any new task directly from its rewards.
arXiv Detail & Related papers (2022-05-18T09:45:14Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
Universal Successor Features for Transfer Reinforcement Learning [77.27304854836645]
We propose Universal Successor Features (USFs) to capture the underlying dynamics of the environment. We show that USFs is compatible with any RL algorithm that learns state values using a temporal difference method.
arXiv Detail & Related papers (2020-01-05T03:41:06Z)
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph. Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference. Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.