Related papers: Learning in ImaginationLand: Omnidirectional Policies through 3D Generative Models (OP-Gen)

Learning in ImaginationLand: Omnidirectional Policies through 3D Generative Models (OP-Gen)

URL: http://arxiv.org/abs/2509.06191v1
Date: Sun, 07 Sep 2025 20:00:59 GMT
Title: Learning in ImaginationLand: Omnidirectional Policies through 3D Generative Models (OP-Gen)
Authors: Yifei Ren, Edward Johns,
Abstract summary: We show that 3D generative models can be used to augment a dataset from a single real-world demonstration.<n>We found that this enables a robot to perform a task when initialised from states very far from those observed during the demonstration.
Score: 12.546786671203646
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent 3D generative models, which are capable of generating full object shapes from just a few images, now open up new opportunities in robotics. In this work, we show that 3D generative models can be used to augment a dataset from a single real-world demonstration, after which an omnidirectional policy can be learned within this imagined dataset. We found that this enables a robot to perform a task when initialised from states very far from those observed during the demonstration, including starting from the opposite side of the object relative to the real-world demonstration, significantly reducing the number of demonstrations required for policy learning. Through several real-world experiments across tasks such as grasping objects, opening a drawer, and placing trash into a bin, we study these omnidirectional policies by investigating the effect of various design choices on policy behaviour, and we show superior performance to recent baselines which use alternative methods for data augmentation.

Related papers

DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation [52.136378691610524]
We present DynaRend, a representation learning framework that learns 3D-aware and dynamics-informed triplane features.<n>By pretraining on multi-view RGB-D video data, DynaRend jointly captures spatial geometry, future dynamics, and task semantics in a unified triplane representation.<n>We evaluate DynaRend on two challenging benchmarks, RLBench and Colosseum, demonstrating substantial improvements in policy success rate, generalization to environmental perturbations, and real-world applicability across diverse manipulation tasks.
arXiv Detail & Related papers (2025-10-28T10:17:11Z)
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation [53.51622803589185]
We propose a novel branch of world model named Gaussian World Model (GWM) for robotic manipulation.<n>At its core is a latent Diffusion Transformer (DiT) combined with a 3D variational autoencoder, enabling fine-grained scene-level future state reconstruction.<n>Both simulated and real-world experiments depict that GWM can precisely predict future scenes conditioned on diverse robot actions.
arXiv Detail & Related papers (2025-08-25T02:01:09Z)
Learning 3D Persistent Embodied World Models [84.40585374179037]
We introduce a new persistent embodied world model with an explicit memory of previously generated content.<n>During generation time, our video diffusion model predicts RGB-D video of the future observations of the agent.<n>This generation is then aggregated into a persistent 3D map of the environment.
arXiv Detail & Related papers (2025-05-05T17:59:17Z)
ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation [22.43711565969091]
Articubot is a system that learns a policy to open diverse categories of unseen articulated objects in the real world.<n>We show that our learned policy can zero-shot transfer to three different real robot settings.
arXiv Detail & Related papers (2025-03-04T22:51:50Z)
View-Invariant Policy Learning via Zero-Shot Novel View Synthesis [26.231630397802785]
We investigate how knowledge from large-scale visual data of the world may be used to address one axis of variation for generalizable manipulation: observational viewpoint.<n>We study single-image novel view synthesis models, which learn 3D-aware scene-level priors by rendering images of the same scene from alternate camera viewpoints.<n>For practical application to diverse robotic data, these models must operate zero-shot, performing view synthesis on unseen tasks and environments.
arXiv Detail & Related papers (2024-09-05T16:39:21Z)
Learning Generalizable Manipulation Policies with Object-Centric 3D Representations [65.55352131167213]
GROOT is an imitation learning method for learning robust policies with object-centric and 3D priors. It builds policies that generalize beyond their initial training conditions for vision-based manipulation. GROOT's performance excels in generalization over background changes, camera viewpoint shifts, and the presence of new object instances.
arXiv Detail & Related papers (2023-10-22T18:51:45Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Uncertainty Guided Policy for Active Robotic 3D Reconstruction using Neural Radiance Fields [82.21033337949757]
This paper introduces a ray-based volumetric uncertainty estimator, which computes the entropy of the weight distribution of the color samples along each ray of the object's implicit neural representation. We show that it is possible to infer the uncertainty of the underlying 3D geometry given a novel view with the proposed estimator. We present a next-best-view selection policy guided by the ray-based volumetric uncertainty in neural radiance fields-based representations.
arXiv Detail & Related papers (2022-09-17T21:28:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.