Related papers: GraphMapper: Efficient Visual Navigation by Scene Graph Generation

GraphMapper: Efficient Visual Navigation by Scene Graph Generation

URL: http://arxiv.org/abs/2205.08325v1
Date: Tue, 17 May 2022 13:21:20 GMT
Title: GraphMapper: Efficient Visual Navigation by Scene Graph Generation
Authors: Zachary Seymour, Niluthpol Chowdhury Mithun, Han-Pang Chiu, Supun Samarasekera, Rakesh Kumar
Abstract summary: We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment. We show that our approach, GraphMapper, can act as a modular scene encoder to operate alongside existing Learning-based solutions.
Score: 13.095640044666348
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the geometric relationships between objects in a scene is a core capability in enabling both humans and autonomous agents to navigate in new environments. A sparse, unified representation of the scene topology will allow agents to act efficiently to move through their environment, communicate the environment state with others, and utilize the representation for diverse downstream tasks. To this end, we propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment by simultaneously learning to navigate through said environment. We demonstrate that our approach, GraphMapper, enables the learning of effective navigation policies through fewer interactions with the environment than vision-based systems alone. Further, we show that GraphMapper can act as a modular scene encoder to operate alongside existing Learning-based solutions to not only increase navigational efficiency but also generate intermediate scene representations that are useful for other future tasks.

Related papers

Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.<n>Our findings are synthesized in Flex (Fly lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.<n>We demonstrate the effectiveness of this approach on a quadrotor fly-to-target task, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z)
Right Place, Right Time! Dynamizing Topological Graphs for Embodied Navigation [55.581423861790945]
Embodied Navigation tasks often involve constructing topological graphs of a scene during exploration. We introduce structured object transitions to dynamize static topological graphs called Object Transition Graphs (OTGs) OTGs simulate portable targets following structured routes inspired by human habits.
arXiv Detail & Related papers (2024-03-14T22:33:22Z)
Mapping High-level Semantic Regions in Indoor Environments without Object Recognition [50.624970503498226]
The present work proposes a method for semantic region mapping via embodied navigation in indoor environments. To enable region identification, the method uses a vision-to-language model to provide scene information for mapping. By projecting egocentric scene understanding into the global frame, the proposed method generates a semantic map as a distribution over possible region labels at each location.
arXiv Detail & Related papers (2024-03-11T18:09:50Z)
Aligning Knowledge Graph with Visual Perception for Object-goal Navigation [16.32780793344835]
We propose the Aligning Knowledge Graph with Visual Perception (AKGVP) method for object-goal navigation. Our approach introduces continuous modeling of the hierarchical scene architecture and leverages visual-language pre-training to align natural language description with visual perception. The integration of a continuous knowledge graph architecture and multimodal feature alignment empowers the navigator with a remarkable zero-shot navigation capability.
arXiv Detail & Related papers (2024-02-29T06:31:18Z)
Feudal Networks for Visual Navigation [6.1190419149081245]
We introduce a new approach to visual navigation using feudal learning. Agents at each level see a different aspect of the task and operate at different spatial and temporal scales. The resulting feudal navigation network achieves near SOTA performance.
arXiv Detail & Related papers (2024-02-19T20:05:41Z)
Interactive Semantic Map Representation for Skill-based Visual Object Navigation [43.71312386938849]
This paper introduces a new representation of a scene semantic map formed during the embodied agent interaction with the indoor environment. We have implemented this representation into a full-fledged navigation approach called SkillTron. The proposed approach makes it possible to form both intermediate goals for robot exploration and the final goal for object navigation.
arXiv Detail & Related papers (2023-11-07T16:30:12Z)
Enhancing Graph Representation of the Environment through Local and Cloud Computation [2.9465623430708905]
We propose a graph-based representation that provides a semantic representation of robot environments from multiple sources. To acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services. The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment.
arXiv Detail & Related papers (2023-09-22T08:05:32Z)
Learning Navigational Visual Representations with Semantic Map Supervision [85.91625020847358]
We propose a navigational-specific visual representation learning method by contrasting the agent's egocentric views and semantic maps. Ego$2$-Map learning transfers the compact and rich information from a map, such as objects, structure and transition, to the agent's egocentric representations for navigation.
arXiv Detail & Related papers (2023-07-23T14:01:05Z)
Lifelong Topological Visual Navigation [16.41858724205884]
We propose a learning-based visual navigation method with graph update strategies that improve lifelong navigation performance over time. We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods. Unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed.
arXiv Detail & Related papers (2021-10-16T06:16:14Z)
Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation. We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment. Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z)
RICE: Refining Instance Masks in Cluttered Environments with Graph Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks. We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations. We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z)
Pushing it out of the Way: Interactive Visual Navigation [62.296686176988125]
We study the problem of interactive navigation where agents learn to change the environment to navigate more efficiently to their goals. We introduce the Neural Interaction Engine (NIE) to explicitly predict the change in the environment caused by the agent's actions. By modeling the changes while planning, we find that agents exhibit significant improvements in their navigational capabilities.
arXiv Detail & Related papers (2021-04-28T22:46:41Z)
iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes [54.04456391489063]
iGibson is a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors.
arXiv Detail & Related papers (2020-12-05T02:14:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.