Spatial Action Maps for Mobile Manipulation
- URL: http://arxiv.org/abs/2004.09141v2
- Date: Thu, 4 Jun 2020 10:56:49 GMT
- Title: Spatial Action Maps for Mobile Manipulation
- Authors: Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon
Rusinkiewicz, Thomas Funkhouser
- Abstract summary: We show that it can be advantageous to learn with dense action representations defined in the same domain as the state.
We present "spatial action maps," in which the set of possible actions is represented by a pixel map.
We find that policies learned with spatial action maps achieve much better performance than traditional alternatives.
- Score: 30.018835572458844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Typical end-to-end formulations for learning robotic navigation involve
predicting a small set of steering command actions (e.g., step forward, turn
left, turn right, etc.) from images of the current state (e.g., a bird's-eye
view of a SLAM reconstruction). Instead, we show that it can be advantageous to
learn with dense action representations defined in the same domain as the
state. In this work, we present "spatial action maps," in which the set of
possible actions is represented by a pixel map (aligned with the input image of
the current state), where each pixel represents a local navigational endpoint
at the corresponding scene location. Using ConvNets to infer spatial action
maps from state images, action predictions are thereby spatially anchored on
local visual features in the scene, enabling significantly faster learning of
complex behaviors for mobile manipulation tasks with reinforcement learning. In
our experiments, we task a robot with pushing objects to a goal location, and
find that policies learned with spatial action maps achieve much better
performance than traditional alternatives.
Related papers
- Interactive Semantic Map Representation for Skill-based Visual Object
Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment.
We have implemented this representation into a full-fledged navigation approach called SkillTron.
The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z) - Object Goal Navigation with Recursive Implicit Maps [92.6347010295396]
We propose an implicit spatial map for object goal navigation.
Our method significantly outperforms the state of the art on the challenging MP3D dataset.
We deploy our model on a real robot and achieve encouraging object goal navigation results in real scenes.
arXiv Detail & Related papers (2023-08-10T14:21:33Z) - Learning Navigational Visual Representations with Semantic Map
Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps.
Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z) - Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language
Navigation [87.52136927091712]
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions.
To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects.
We propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively.
arXiv Detail & Related papers (2022-10-14T04:23:27Z) - Sparse Image based Navigation Architecture to Mitigate the need of
precise Localization in Mobile Robots [3.1556608426768324]
This paper focuses on mitigating the need for exact localization of a mobile robot to pursue autonomous navigation using a sparse set of images.
The proposed method consists of a model architecture - RoomNet, for unsupervised learning resulting in a coarse identification of the environment.
The latter uses sparse image matching to characterise the similarity of frames achieved vis-a-vis the frames viewed by the robot during the mapping and training stage.
arXiv Detail & Related papers (2022-03-29T06:38:18Z) - Semantic Image Alignment for Vehicle Localization [111.59616433224662]
We present a novel approach to vehicle localization in dense semantic maps using semantic segmentation from a monocular camera.
In contrast to existing visual localization approaches, the system does not require additional keypoint features, handcrafted localization landmark extractors or expensive LiDAR sensors.
arXiv Detail & Related papers (2021-10-08T14:40:15Z) - Learning Synthetic to Real Transfer for Localization and Navigational
Tasks [7.019683407682642]
Navigation is at the crossroad of multiple disciplines, it combines notions of computer vision, robotics and control.
This work aimed at creating, in a simulation, a navigation pipeline whose transfer to the real world could be done with as few efforts as possible.
To design the navigation pipeline four main challenges arise; environment, localization, navigation and planning.
arXiv Detail & Related papers (2020-11-20T08:37:03Z) - Unsupervised Domain Adaptation for Visual Navigation [115.85181329193092]
We propose an unsupervised domain adaptation method for visual navigation.
Our method translates the images in the target domain to the source domain such that the translation is consistent with the representations learned by the navigation policy.
arXiv Detail & Related papers (2020-10-27T18:22:43Z) - Latent Space Roadmap for Visual Action Planning of Deformable and Rigid
Object Manipulation [74.88956115580388]
Planning is performed in a low-dimensional latent state space that embeds images.
Our framework consists of two main components: a Visual Foresight Module (VFM) that generates a visual plan as a sequence of images, and an Action Proposal Network (APN) that predicts the actions between them.
arXiv Detail & Related papers (2020-03-19T18:43:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.