Hyper-Universal Policy Approximation: Learning to Generate Actions from
a Single Image using Hypernets
- URL: http://arxiv.org/abs/2207.03593v1
- Date: Thu, 7 Jul 2022 21:42:54 GMT
- Title: Hyper-Universal Policy Approximation: Learning to Generate Actions from
a Single Image using Hypernets
- Authors: Dimitrios C. Gklezakos, Rishi Jha, Rajesh P. N. Rao
- Abstract summary: Universal Policy Functions (UPFs) are state-to-action mappings that generalize to novel, unseen environments.
We propose the Hyper-Universal Policy Approximator (HUPA), a hypernetwork-based model to generate small task- and environment-conditional policy networks from a single image.
Our results show that HUPAs significantly outperform an embedding-based alternative for generated policies that are size-constrained.
- Score: 1.3535770763481902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inspired by Gibson's notion of object affordances in human vision, we ask the
question: how can an agent learn to predict an entire action policy for a novel
object or environment given only a single glimpse? To tackle this problem, we
introduce the concept of Universal Policy Functions (UPFs) which are
state-to-action mappings that generalize not only to new goals but most
importantly to novel, unseen environments. Specifically, we consider the
problem of efficiently learning such policies for agents with limited
computational and communication capacity, constraints that are frequently
encountered in edge devices. We propose the Hyper-Universal Policy Approximator
(HUPA), a hypernetwork-based model to generate small task- and
environment-conditional policy networks from a single image, with good
generalization properties. Our results show that HUPAs significantly outperform
an embedding-based alternative for generated policies that are
size-constrained. Although this work is restricted to a simple map-based
navigation task, future work includes applying the principles behind HUPAs to
learning more general affordances for objects and environments.
Related papers
- Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion [41.52811286996212]
We present Make-An-Agent, a novel policy parameter generator for behavior-to-policy generation.
Our generation model demonstrates remarkable versatility and scalability on multiple tasks.
We showcase its efficacy and efficiency on various domains and tasks, including varying objectives, behaviors, and even across different robot manipulators.
arXiv Detail & Related papers (2024-07-15T17:59:57Z) - Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach [1.7205106391379026]
Foundation models are a promising path toward general-purpose and user-friendly robots.
In particular, the lack of modularity between tasks means that when model weights are updated, the behavior in other, unrelated tasks may be affected.
We present an alternative approach to the design of robot foundation models, which generates stand-alone, task-specific policies.
arXiv Detail & Related papers (2024-07-10T21:55:44Z) - Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies [25.760946763103483]
We propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks.
Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.
arXiv Detail & Related papers (2024-06-17T17:00:41Z) - Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data.
Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment.
Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z) - Federated Natural Policy Gradient Methods for Multi-task Reinforcement
Learning [49.65958529941962]
Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories.
In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks.
We learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner.
arXiv Detail & Related papers (2023-11-01T00:15:18Z) - Residual Q-Learning: Offline and Online Policy Customization without
Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations.
We formulate a new problem setting called policy customization.
We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z) - Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in
Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments.
To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command.
We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Stronger Generalization Guarantees for Robot Learning by Combining
Generative Models and Real-World Data [5.935761705025763]
We provide a framework for providing generalization guarantees by leveraging a finite dataset of real-world environments.
We demonstrate our approach on two simulated systems with nonlinear/hybrid dynamics and rich sensing modalities.
arXiv Detail & Related papers (2021-11-16T20:13:10Z) - DisCo RL: Distribution-Conditioned Reinforcement Learning for
General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies.
We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.