Related papers: Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets

Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets

URL: http://arxiv.org/abs/2207.03593v1
Date: Thu, 7 Jul 2022 21:42:54 GMT
Title: Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets
Authors: Dimitrios C. Gklezakos, Rishi Jha, Rajesh P. N. Rao
Abstract summary: Universal Policy Functions (UPFs) are state-to-action mappings that generalize to novel, unseen environments. We propose the Hyper-Universal Policy Approximator (HUPA), a hypernetwork-based model to generate small task- and environment-conditional policy networks from a single image. Our results show that HUPAs significantly outperform an embedding-based alternative for generated policies that are size-constrained.
Score: 1.3535770763481902
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Inspired by Gibson's notion of object affordances in human vision, we ask the question: how can an agent learn to predict an entire action policy for a novel object or environment given only a single glimpse? To tackle this problem, we introduce the concept of Universal Policy Functions (UPFs) which are state-to-action mappings that generalize not only to new goals but most importantly to novel, unseen environments. Specifically, we consider the problem of efficiently learning such policies for agents with limited computational and communication capacity, constraints that are frequently encountered in edge devices. We propose the Hyper-Universal Policy Approximator (HUPA), a hypernetwork-based model to generate small task- and environment-conditional policy networks from a single image, with good generalization properties. Our results show that HUPAs significantly outperform an embedding-based alternative for generated policies that are size-constrained. Although this work is restricted to a simple map-based navigation task, future work includes applying the principles behind HUPAs to learning more general affordances for objects and environments.

Related papers

Customize Multi-modal RAI Guardrails with Precedent-based predictions [55.63757336900865]
A multi-modal guardrail must effectively filter image content based on user-defined policies.<n>Existing fine-tuning methods typically condition predictions on pre-defined policies.<n>We propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input.
arXiv Detail & Related papers (2025-07-28T03:45:34Z)
Dense Policy: Bidirectional Autoregressive Learning of Actions [51.60428100831717]
This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner. Experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies.
arXiv Detail & Related papers (2025-03-17T14:28:08Z)
GHIL-Glue: Hierarchical Control with Filtered Subgoal Images [68.36060286192262]
Generative Hierarchical Imitation Learning-Glue (GHIL-Glue) is an interface to "glue together" language-conditioned image or video prediction models with low-level goal-conditioned policies. GHIL-Glue filters out subgoals that do not lead to task progress and improves the robustness of goal-conditioned policies to generated subgoals with harmful visual artifacts. We find in extensive experiments in both simulated and real environments that GHIL-Glue achieves a 25% improvement across several hierarchical models that leverage generative subgoals.
arXiv Detail & Related papers (2024-10-26T00:32:21Z)
Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion [41.52811286996212]
Make-An-Agent is a novel policy parameter generator for behavior-to-policy generation. We show how it can generate a control policy for an agent using just one demonstration of desired behaviors as a prompt. We also deploy policies generated by Make-An-Agent onto real-world robots on locomotion tasks.
arXiv Detail & Related papers (2024-07-15T17:59:57Z)
Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach [1.7205106391379026]
Foundation models are a promising path toward general-purpose and user-friendly robots. In particular, the lack of modularity between tasks means that when model weights are updated, the behavior in other, unrelated tasks may be affected. We present an alternative approach to the design of robot foundation models, which generates stand-alone, task-specific policies.
arXiv Detail & Related papers (2024-07-10T21:55:44Z)
Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies [25.760946763103483]
We propose Imagination Policy, a novel multi-task key-frame policy network for solving high-precision pick and place tasks. Instead of learning actions directly, Imagination Policy generates point clouds to imagine desired states which are then translated to actions using rigid action estimation.
arXiv Detail & Related papers (2024-06-17T17:00:41Z)
Foundation Policies with Hilbert Representations [54.44869979017766]
We propose an unsupervised framework to pre-train generalist policies from unlabeled offline data. Our key insight is to learn a structured representation that preserves the temporal structure of the underlying environment. Our experiments show that our unsupervised policies can solve goal-conditioned and general RL tasks in a zero-shot fashion.
arXiv Detail & Related papers (2024-02-23T19:09:10Z)
Residual Q-Learning: Offline and Online Policy Customization without Value [53.47311900133564]
Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. We formulate a new problem setting called policy customization. We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy.
arXiv Detail & Related papers (2023-06-15T22:01:19Z)
Planning to Practice: Efficient Online Fine-Tuning by Composing Goals in Latent Space [76.46113138484947]
General-purpose robots require diverse repertoires of behaviors to complete challenging tasks in real-world unstructured environments. To address this issue, goal-conditioned reinforcement learning aims to acquire policies that can reach goals for a wide range of tasks on command. We propose Planning to Practice, a method that makes it practical to train goal-conditioned policies for long-horizon tasks.
arXiv Detail & Related papers (2022-05-17T06:58:17Z)
Constructing a Good Behavior Basis for Transfer using Generalized Policy Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks. We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data [5.935761705025763]
We provide a framework for providing generalization guarantees by leveraging a finite dataset of real-world environments. We demonstrate our approach on two simulated systems with nonlinear/hybrid dynamics and rich sensing modalities.
arXiv Detail & Related papers (2021-11-16T20:13:10Z)
DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies [116.12670064963625]
We develop an off-policy algorithm called distribution-conditioned reinforcement learning (DisCo RL) to efficiently learn contextual policies. We evaluate DisCo RL on a variety of robot manipulation tasks and find that it significantly outperforms prior methods on tasks that require generalization to new goal distributions.
arXiv Detail & Related papers (2021-04-23T16:51:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.