SPIN: Simultaneous Perception, Interaction and Navigation
- URL: http://arxiv.org/abs/2405.07991v1
- Date: Mon, 13 May 2024 17:59:36 GMT
- Title: SPIN: Simultaneous Perception, Interaction and Navigation
- Authors: Shagun Uppal, Ananye Agarwal, Haoyu Xiong, Kenneth Shaw, Deepak Pathak,
- Abstract summary: We present a reactive mobile manipulation framework that uses an active visual system to consciously perceive and react to its environment.
Similar to how humans leverage whole-body and hand-eye coordination, we develop a mobile manipulator that exploits its ability to move and see.
- Score: 33.408010508592824
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manipulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in unstructured and dynamic environments. While the applications are broad and interesting, there are a plethora of challenges in developing these systems such as coordination between the base and arm, reliance on onboard perception for perceiving and interacting with the environment, and most importantly, simultaneously integrating all these parts together. Prior works approach the problem using disentangled modular skills for mobility and manipulation that are trivially tied together. This causes several limitations such as compounding errors, delays in decision-making, and no whole-body coordination. In this work, we present a reactive mobile manipulation framework that uses an active visual system to consciously perceive and react to its environment. Similar to how humans leverage whole-body and hand-eye coordination, we develop a mobile manipulator that exploits its ability to move and see, more specifically -- to move in order to see and to see in order to move. This allows it to not only move around and interact with its environment but also, choose "when" to perceive "what" using an active visual system. We observe that such an agent learns to navigate around complex cluttered scenarios while displaying agile whole-body coordination using only ego-vision without needing to create environment maps. Results visualizations and videos at https://spin-robot.github.io/
Related papers
- Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes [83.55301458112672]
Sitcom-Crafter is a system for human motion generation in 3D space.
Central to the function generation modules is our novel 3D scene-aware human-human interaction module.
Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types.
arXiv Detail & Related papers (2024-10-14T17:56:19Z) - HYPERmotion: Learning Hybrid Behavior Planning for Autonomous Loco-manipulation [7.01404330241523]
HYPERmotion is a framework that learns, selects and plans behaviors based on tasks in different scenarios.
We combine reinforcement learning with whole-body optimization to generate motion for 38 actuated joints.
Experiments in simulation and real-world show that learned motions can efficiently adapt to new tasks.
arXiv Detail & Related papers (2024-06-20T18:21:24Z) - Harmonic Mobile Manipulation [35.82197562695662]
HarmonicMM is an end-to-end learning method that optimize both navigation and manipulation.
Our contributions include a new benchmark for mobile manipulation and the successful deployment with only RGB visual observation.
arXiv Detail & Related papers (2023-12-11T18:54:42Z) - Revisit Human-Scene Interaction via Space Occupancy [55.67657438543008]
Human-scene Interaction (HSI) generation is a challenging task and crucial for various downstream tasks.
In this work, we argue that interaction with a scene is essentially interacting with the space occupancy of the scene from an abstract physical perspective.
By treating pure motion sequences as records of humans interacting with invisible scene occupancy, we can aggregate motion-only data into a large-scale paired human-occupancy interaction database.
arXiv Detail & Related papers (2023-12-05T12:03:00Z) - QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse
Sensors [69.75711933065378]
We show that headset and controller pose can generate realistic full-body poses even in highly constrained environments.
We discuss three features, the environment representation, the contact reward and scene randomization, crucial to the performance of the method.
arXiv Detail & Related papers (2023-06-09T04:40:38Z) - Synthesizing Physical Character-Scene Interactions [64.26035523518846]
It is necessary to synthesize such interactions between virtual characters and their surroundings.
We present a system that uses adversarial imitation learning and reinforcement learning to train physically-simulated characters.
Our approach takes physics-based character motion generation a step closer to broad applicability.
arXiv Detail & Related papers (2023-02-02T05:21:32Z) - N$^2$M$^2$: Learning Navigation for Arbitrary Mobile Manipulation
Motions in Unseen and Dynamic Environments [9.079709086741987]
We introduce Neural Navigation for Mobile Manipulation (N$2$M$2$) which extends this decomposition to complex obstacle environments.
The resulting approach can perform unseen, long-horizon tasks in unexplored environments while instantly reacting to dynamic obstacles and environmental changes.
We demonstrate the capabilities of our proposed approach in extensive simulation and real-world experiments on multiple kinematically diverse mobile manipulators.
arXiv Detail & Related papers (2022-06-17T12:52:41Z) - A Differentiable Recipe for Learning Visual Non-Prehensile Planar
Manipulation [63.1610540170754]
We focus on the problem of visual non-prehensile planar manipulation.
We propose a novel architecture that combines video decoding neural models with priors from contact mechanics.
We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions.
arXiv Detail & Related papers (2021-11-09T18:39:45Z) - Articulated Object Interaction in Unknown Scenes with Whole-Body Mobile
Manipulation [16.79185733369416]
We propose a two-stage architecture for autonomous interaction with large articulated objects in unknown environments.
The first stage uses a learned model to estimate the articulated model of a target object from an RGB-D input and predicts an action-conditional sequence of states for interaction.
The second stage comprises of a whole-body motion controller to manipulate the object along the generated kinematic plan.
arXiv Detail & Related papers (2021-03-18T21:32:18Z) - Modeling Long-horizon Tasks as Sequential Interaction Landscapes [75.5824586200507]
We present a deep learning network that learns dependencies and transitions across subtasks solely from a set of demonstration videos.
We show that these symbols can be learned and predicted directly from image observations.
We evaluate our framework on two long horizon tasks: (1) block stacking of puzzle pieces being executed by humans, and (2) a robot manipulation task involving pick and place of objects and sliding a cabinet door with a 7-DoF robot arm.
arXiv Detail & Related papers (2020-06-08T18:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.