Mobi-$π$: Mobilizing Your Robot Learning Policy
- URL: http://arxiv.org/abs/2505.23692v2
- Date: Fri, 26 Sep 2025 16:17:30 GMT
- Title: Mobi-$π$: Mobilizing Your Robot Learning Policy
- Authors: Jingyun Yang, Isabella Huang, Brandon Vu, Max Bajracharya, Rika Antonova, Jeannette Bohg,
- Abstract summary: We propose a novel approach for policy mobilization that bridges navigation and manipulation by optimizing the robot's base pose to align with an in-distribution base pose for a learned policy.<n>Our approach utilizes 3D Gaussian Splatting for novel view synthesis, a score function to evaluate pose suitability, and sampling-based optimization to identify optimal robot poses.
- Score: 13.718887275978092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned visuomotor policies are capable of performing increasingly complex manipulation tasks. However, most of these policies are trained on data collected from limited robot positions and camera viewpoints. This leads to poor generalization to novel robot positions, which limits the use of these policies on mobile platforms, especially for precise tasks like pressing buttons or turning faucets. In this work, we formulate the policy mobilization problem: find a mobile robot base pose in a novel environment that is in distribution with respect to a manipulation policy trained on a limited set of camera viewpoints. Compared to retraining the policy itself to be more robust to unseen robot base pose initializations, policy mobilization decouples navigation from manipulation and thus does not require additional demonstrations. Crucially, this problem formulation complements existing efforts to improve manipulation policy robustness to novel viewpoints and remains compatible with them. We propose a novel approach for policy mobilization that bridges navigation and manipulation by optimizing the robot's base pose to align with an in-distribution base pose for a learned policy. Our approach utilizes 3D Gaussian Splatting for novel view synthesis, a score function to evaluate pose suitability, and sampling-based optimization to identify optimal robot poses. To understand policy mobilization in more depth, we also introduce the Mobi-$\pi$ framework, which includes: (1) metrics that quantify the difficulty of mobilizing a given policy, (2) a suite of simulated mobile manipulation tasks based on RoboCasa to evaluate policy mobilization, and (3) visualization tools for analysis. In both our developed simulation task suite and the real world, we show that our approach outperforms baselines, demonstrating its effectiveness for policy mobilization.
Related papers
- Flow Policy Gradients for Robot Control [67.61978635211048]
Flow matching policy gradients can be made effective for training and fine-tuning more expressive policies.<n>We show how policies can exploit the flow representation for exploration when training from scratch, as well as improved fine-tuning robustness over baselines.
arXiv Detail & Related papers (2026-02-02T18:56:49Z) - FLEX: A Framework for Learning Robot-Agnostic Force-based Skills Involving Sustained Contact Object Manipulation [9.292150395779332]
We propose a novel framework for learning object-centric manipulation policies in force space.<n>Our method simplifies the action space, reduces unnecessary exploration, and decreases simulation overhead.<n>Our evaluations demonstrate that the method significantly outperforms baselines.
arXiv Detail & Related papers (2025-03-17T17:49:47Z) - Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids [56.892520712892804]
We introduce a practical sim-to-real RL recipe that trains a humanoid robot to perform three dexterous manipulation tasks.<n>We demonstrate high success rates on unseen objects and robust, adaptive policy behaviors.
arXiv Detail & Related papers (2025-02-27T18:59:52Z) - P3-PO: Prescriptive Point Priors for Visuo-Spatial Generalization of Robot Policies [19.12762500264209]
Prescriptive Point Priors for Policies or P3-PO is a novel framework that constructs a unique state representation of the environment.<n>P3-PO exhibits 58% and 80% gains across tasks for new object instances and more cluttered environments respectively.
arXiv Detail & Related papers (2024-12-09T18:59:42Z) - GRAPPA: Generalizing and Adapting Robot Policies via Online Agentic Guidance [15.774237279917594]
We propose an agentic framework for robot self-guidance and self-improvement.<n>Our framework iteratively grounds a base robot policy to relevant objects in the environment.<n>We demonstrate that our approach can effectively guide manipulation policies to achieve significantly higher success rates.
arXiv Detail & Related papers (2024-10-09T02:00:37Z) - Learning Vision-based Pursuit-Evasion Robot Policies [54.52536214251999]
We develop a fully-observable robot policy that generates supervision for a partially-observable one.
We deploy our policy on a physical quadruped robot with an RGB-D camera on pursuit-evasion interactions in the wild.
arXiv Detail & Related papers (2023-08-30T17:59:05Z) - Polybot: Training One Policy Across Robots While Embracing Variability [70.74462430582163]
We propose a set of key design decisions to train a single policy for deployment on multiple robotic platforms.
Our framework first aligns the observation and action spaces of our policy across embodiments via utilizing wrist cameras.
We evaluate our method on a dataset collected over 60 hours spanning 6 tasks and 3 robots with varying joint configurations and sizes.
arXiv Detail & Related papers (2023-07-07T17:21:16Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.<n>Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.<n>Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to
Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand.
Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z) - Affordance Learning from Play for Sample-Efficient Policy Learning [30.701546777177555]
We use a self-supervised visual affordance model from human teleoperated play data to enable efficient policy learning and motion planning.
We combine model-based planning with model-free deep reinforcement learning to learn policies that favor the same object regions favored by people.
We find that our policies train 4x faster than the baselines and generalize better to novel objects because our visual affordance model can anticipate their affordance regions.
arXiv Detail & Related papers (2022-03-01T11:00:35Z) - Distilling Motion Planner Augmented Policies into Visual Control
Policies for Robot Manipulation [26.47544415550067]
We propose to distill a state-based motion planner augmented policy to a visual control policy.
We evaluate our method on three manipulation tasks in obstructed environments.
Our framework is highly sample-efficient and outperforms the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-11-11T18:52:00Z) - Bayesian Meta-Learning for Few-Shot Policy Adaptation Across Robotic
Platforms [60.59764170868101]
Reinforcement learning methods can achieve significant performance but require a large amount of training data collected on the same robotic platform.
We formulate it as a few-shot meta-learning problem where the goal is to find a model that captures the common structure shared across different robotic platforms.
We experimentally evaluate our framework on a simulated reaching and a real-robot picking task using 400 simulated robots.
arXiv Detail & Related papers (2021-03-05T14:16:20Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.