MEM: Multi-Modal Elevation Mapping for Robotics and Learning
- URL: http://arxiv.org/abs/2309.16818v1
- Date: Thu, 28 Sep 2023 19:55:29 GMT
- Title: MEM: Multi-Modal Elevation Mapping for Robotics and Learning
- Authors: Gian Erni, Jonas Frey, Takahiro Miki, Matias Mattamala, Marco Hutter
- Abstract summary: We extend a 2.5D robot-centric elevation mapping framework by fusing multi-modal information from multiple sources into a popular map representation.
Our system is designed to run on the GPU, making it real-time capable for various robotic and learning tasks.
- Score: 10.476978089902818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Elevation maps are commonly used to represent the environment of mobile
robots and are instrumental for locomotion and navigation tasks. However, pure
geometric information is insufficient for many field applications that require
appearance or semantic information, which limits their applicability to other
platforms or domains. In this work, we extend a 2.5D robot-centric elevation
mapping framework by fusing multi-modal information from multiple sources into
a popular map representation. The framework allows inputting data contained in
point clouds or images in a unified manner. To manage the different nature of
the data, we also present a set of fusion algorithms that can be selected based
on the information type and user requirements. Our system is designed to run on
the GPU, making it real-time capable for various robotic and learning tasks. We
demonstrate the capabilities of our framework by deploying it on multiple
robots with varying sensor configurations and showcasing a range of
applications that utilize multi-modal layers, including line detection, human
detection, and colorization.
Related papers
- MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics [41.94295877935867]
We study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition.
Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors.
arXiv Detail & Related papers (2024-07-22T14:24:56Z) - LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning [50.99807031490589]
We introduce LLARVA, a model trained with a novel instruction tuning method to unify a range of robotic learning tasks, scenarios, and environments.
We generate 8.5M image-visual trace pairs from the Open X-Embodiment dataset in order to pre-train our model.
Experiments yield strong performance, demonstrating that LLARVA performs well compared to several contemporary baselines.
arXiv Detail & Related papers (2024-06-17T17:55:29Z) - An Interactive Agent Foundation Model [49.77861810045509]
We propose an Interactive Agent Foundation Model that uses a novel multi-task agent training paradigm for training AI agents.
Our training paradigm unifies diverse pre-training strategies, including visual masked auto-encoders, language modeling, and next-action prediction.
We demonstrate the performance of our framework across three separate domains -- Robotics, Gaming AI, and Healthcare.
arXiv Detail & Related papers (2024-02-08T18:58:02Z) - Pre-Trained Masked Image Model for Mobile Robot Navigation [16.330708552384053]
2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas.
Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency.
We show that the existing foundational vision networks can accomplish the same without any fine-tuning.
arXiv Detail & Related papers (2023-10-10T21:16:29Z) - Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions
with Large Language Model [63.66204449776262]
Instruct2Act is a framework that maps multi-modal instructions to sequential actions for robotic manipulation tasks.
Our approach is adjustable and flexible in accommodating various instruction modalities and input types.
Our zero-shot method outperformed many state-of-the-art learning-based policies in several tasks.
arXiv Detail & Related papers (2023-05-18T17:59:49Z) - HabitatDyn Dataset: Dynamic Object Detection to Kinematics Estimation [16.36110033895749]
We propose the dataset HabitatDyn, which contains both synthetic RGB videos, semantic labels, and depth information, as well as kinetics information.
HabitatDyn was created from the perspective of a mobile robot with a moving camera, and contains 30 scenes featuring six different types of moving objects with varying velocities.
arXiv Detail & Related papers (2023-04-21T09:57:35Z) - ExAug: Robot-Conditioned Navigation Policies via Geometric Experience
Augmentation [73.63212031963843]
We propose a novel framework, ExAug, to augment the experiences of different robot platforms from multiple datasets in diverse environments.
The trained policy is evaluated on two new robot platforms with three different cameras in indoor and outdoor environments with obstacles.
arXiv Detail & Related papers (2022-10-14T01:32:15Z) - GNM: A General Navigation Model to Drive Any Robot [67.40225397212717]
General goal-conditioned model for vision-based navigation can be trained on data obtained from many distinct but structurally similar robots.
We analyze the necessary design decisions for effective data sharing across robots.
We deploy the trained GNM on a range of new robots, including an under quadrotor.
arXiv Detail & Related papers (2022-10-07T07:26:41Z) - MetaGraspNet: A Large-Scale Benchmark Dataset for Vision-driven Robotic
Grasping via Physics-based Metaverse Synthesis [78.26022688167133]
We present a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis.
The proposed dataset contains 100,000 images and 25 different object types.
We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance.
arXiv Detail & Related papers (2021-12-29T17:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.