Pre-Trained Masked Image Model for Mobile Robot Navigation
- URL: http://arxiv.org/abs/2310.07021v2
- Date: Mon, 25 Mar 2024 19:46:25 GMT
- Title: Pre-Trained Masked Image Model for Mobile Robot Navigation
- Authors: Vishnu Dutt Sharma, Anukriti Singh, Pratap Tokekar,
- Abstract summary: 2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas.
Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency.
We show that the existing foundational vision networks can accomplish the same without any fine-tuning.
- Score: 16.330708552384053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas. Typically, the robot builds the navigation maps incrementally from local observations using onboard sensors. Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency. While many such works build task-specific networks using limited datasets, we show that the existing foundational vision networks can accomplish the same without any fine-tuning. Specifically, we use Masked Autoencoders, pre-trained on street images, to present novel applications for field-of-view expansion, single-agent topological exploration, and multi-agent exploration for indoor mapping, across different input modalities. Our work motivates the use of foundational vision models for generalized structure prediction-driven applications, especially in the dearth of training data. For more qualitative results see https://raaslab.org/projects/MIM4Robots.
Related papers
- Feudal Networks for Visual Navigation [6.1190419149081245]
We introduce a new approach to visual navigation using feudal learning.
Agents at each level see a different aspect of the task and operate at different spatial and temporal scales.
The resulting feudal navigation network achieves near SOTA performance.
arXiv Detail & Related papers (2024-02-19T20:05:41Z) - Interactive Semantic Map Representation for Skill-based Visual Object
Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment.
We have implemented this representation into a full-fledged navigation approach called SkillTron.
The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z) - NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration [57.15811390835294]
This paper describes how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration.
We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments.
Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods.
arXiv Detail & Related papers (2023-10-11T21:07:14Z) - MEM: Multi-Modal Elevation Mapping for Robotics and Learning [10.476978089902818]
We extend a 2.5D robot-centric elevation mapping framework by fusing multi-modal information from multiple sources into a popular map representation.
Our system is designed to run on the GPU, making it real-time capable for various robotic and learning tasks.
arXiv Detail & Related papers (2023-09-28T19:55:29Z) - ViNT: A Foundation Model for Visual Navigation [52.2571739391896]
Visual Navigation Transformer (ViNT) is a foundation model for vision-based robotic navigation.
ViNT is trained with a general goal-reaching objective that can be used with any navigation dataset.
It exhibits positive transfer, outperforming specialist models trained on singular datasets.
arXiv Detail & Related papers (2023-06-26T16:57:03Z) - Predicting Dense and Context-aware Cost Maps for Semantic Robot
Navigation [35.45993685414002]
We investigate the task of object goal navigation in unknown environments where the target is specified by a semantic label.
We propose a deep neural network architecture and loss function to predict dense cost maps that implicitly contain semantic context.
We also present a novel way of fusing mid-level visual representations in our architecture to provide additional semantic cues for cost map prediction.
arXiv Detail & Related papers (2022-10-17T11:43:19Z) - GNM: A General Navigation Model to Drive Any Robot [67.40225397212717]
General goal-conditioned model for vision-based navigation can be trained on data obtained from many distinct but structurally similar robots.
We analyze the necessary design decisions for effective data sharing across robots.
We deploy the trained GNM on a range of new robots, including an under quadrotor.
arXiv Detail & Related papers (2022-10-07T07:26:41Z) - Rapid Exploration for Open-World Navigation with Latent Goal Models [78.45339342966196]
We describe a robotic learning system for autonomous exploration and navigation in diverse, open-world environments.
At the core of our method is a learned latent variable model of distances and actions, along with a non-parametric topological memory of images.
We use an information bottleneck to regularize the learned policy, giving us (i) a compact visual representation of goals, (ii) improved generalization capabilities, and (iii) a mechanism for sampling feasible goals for exploration.
arXiv Detail & Related papers (2021-04-12T23:14:41Z) - Occupancy Anticipation for Efficient Exploration and Navigation [97.17517060585875]
We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions.
By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment.
Our approach is the winning entry in the 2020 Habitat PointNav Challenge.
arXiv Detail & Related papers (2020-08-21T03:16:51Z) - Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation [143.6144560164782]
We introduce a learning-based approach for room navigation using semantic maps.
We train a model to generate amodal semantic top-down maps indicating beliefs of location, size, and shape of rooms.
Next, we use these maps to predict a point that lies in the target room and train a policy to navigate to the point.
arXiv Detail & Related papers (2020-07-20T02:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.