Related papers: Enhancing Graph Representation of the Environment through Local and Cloud Computation

Enhancing Graph Representation of the Environment through Local and Cloud Computation

URL: http://arxiv.org/abs/2309.12692v1
Date: Fri, 22 Sep 2023 08:05:32 GMT
Title: Enhancing Graph Representation of the Environment through Local and Cloud Computation
Authors: Francesco Argenziano, Vincenzo Suriani and Daniele Nardi
Abstract summary: We propose a graph-based representation that provides a semantic representation of robot environments from multiple sources. To acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services. The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment.
Score: 2.9465623430708905
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Enriching the robot representation of the operational environment is a challenging task that aims at bridging the gap between low-level sensor readings and high-level semantic understanding. Having a rich representation often requires computationally demanding architectures and pure point cloud based detection systems that struggle when dealing with everyday objects that have to be handled by the robot. To overcome these issues, we propose a graph-based representation that addresses this gap by providing a semantic representation of robot environments from multiple sources. In fact, to acquire information from the environment, the framework combines classical computer vision tools with modern computer vision cloud services, ensuring computational feasibility on onboard hardware. By incorporating an ontology hierarchy with over 800 object classes, the framework achieves cross-domain adaptability, eliminating the need for environment-specific tools. The proposed approach allows us to handle also small objects and integrate them into the semantic representation of the environment. The approach is implemented in the Robot Operating System (ROS) using the RViz visualizer for environment representation. This work is a first step towards the development of a general-purpose framework, to facilitate intuitive interaction and navigation across different domains.

Related papers

Humanoid Occupancy: Enabling A Generalized Multimodal Occupancy Perception System on Humanoid Robots [50.0783429451902]
Humanoid robot technology is advancing rapidly, with manufacturers introducing diverse visual perception modules tailored to specific scenarios.<n> occupancy-based representation has become widely recognized as particularly suitable for humanoid robots, as it provides both rich semantic and 3D geometric information essential for comprehensive environmental understanding.<n>We present Humanoid Occupancy, a generalized multimodal occupancy perception system that integrates hardware and software components, data acquisition devices, and a dedicated annotation pipeline.
arXiv Detail & Related papers (2025-07-27T10:47:00Z)
Hi-Dyna Graph: Hierarchical Dynamic Scene Graph for Robotic Autonomy in Human-Centric Environments [41.80879866951797]
Hi-Dyna Graph is a hierarchical dynamic scene graph architecture that integrates persistent global layouts with localized dynamic semantics for embodied robotic autonomy.<n>An agent powered by large language models (LLMs) is employed to interpret the unified graph, infer latent task triggers, and generate executable instructions grounded in robotic affordances.
arXiv Detail & Related papers (2025-05-30T03:35:29Z)
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction [69.57190742976091]
We introduce Aguvis, a unified vision-based framework for autonomous GUI agents. Our approach leverages image-based observations, and grounding instructions in natural language to visual elements. To address the limitations of previous work, we integrate explicit planning and reasoning within the model.
arXiv Detail & Related papers (2024-12-05T18:58:26Z)
Time is on my sight: scene graph filtering for dynamic environment perception in an LLM-driven robot [0.8515309662618664]
This paper presents a robot control architecture that addresses key challenges in human-robot interaction. The architecture uses Large Language Models to integrate diverse information sources, including natural language commands. The architecture enhances adaptability, task efficiency, and human-robot collaboration in dynamic environments.
arXiv Detail & Related papers (2024-11-22T15:58:26Z)
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments [13.988804095409133]
We propose the ReALFRED benchmark that employs real-world scenes, objects, and room layouts to learn agents to complete household tasks. Specifically, we extend the ALFRED benchmark with updates for larger environmental spaces with smaller visual domain gaps. With ReALFRED, we analyze previously crafted methods for the ALFRED benchmark and observe that they consistently yield lower performance in all metrics.
arXiv Detail & Related papers (2024-07-26T07:00:27Z)
Cognitive Planning for Object Goal Navigation using Generative AI Models [0.979851640406258]
We present a novel framework for solving the object goal navigation problem that generates efficient exploration strategies. Our approach enables a robot to navigate unfamiliar environments by leveraging Large Language Models (LLMs) and Large Vision-Language Models (LVLMs)
arXiv Detail & Related papers (2024-03-30T10:54:59Z)
Agent AI: Surveying the Horizons of Multimodal Interaction [83.18367129924997]
"Agent AI" is a class of interactive systems that can perceive visual stimuli, language inputs, and other environmentally-grounded data. We envision a future where people can easily create any virtual reality or simulated scene and interact with agents embodied within the virtual environment.
arXiv Detail & Related papers (2024-01-07T19:11:18Z)
Graphical Object-Centric Actor-Critic [55.2480439325792]
We propose a novel object-centric reinforcement learning algorithm combining actor-critic and model-based approaches. We use a transformer encoder to extract object representations and graph neural networks to approximate the dynamics of an environment. Our algorithm performs better in a visually complex 3D robotic environment and a 2D environment with compositional structure than the state-of-the-art model-free actor-critic algorithm.
arXiv Detail & Related papers (2023-10-26T06:05:12Z)
SCIM: Simultaneous Clustering, Inference, and Mapping for Open-World Semantic Scene Understanding [34.19666841489646]
We show how a robot can autonomously discover novel semantic classes and improve accuracy on known classes when exploring an unknown environment. We develop a general framework for mapping and clustering that we then use to generate a self-supervised learning signal to update a semantic segmentation model. In particular, we show how clustering parameters can be optimized during deployment and that fusion of multiple observation modalities improves novel object discovery compared to prior work.
arXiv Detail & Related papers (2022-06-21T18:41:51Z)
GraphMapper: Efficient Visual Navigation by Scene Graph Generation [13.095640044666348]
We propose a method to train an autonomous agent to learn to accumulate a 3D scene graph representation of its environment. We show that our approach, GraphMapper, can act as a modular scene encoder to operate alongside existing Learning-based solutions.
arXiv Detail & Related papers (2022-05-17T13:21:20Z)
Optical flow-based branch segmentation for complex orchard environments [73.11023209243326]
We train a neural network system in simulation only using simulated RGB data and optical flow. This resulting neural network is able to perform foreground segmentation of branches in a busy orchard environment without additional real-world training or using any special setup or equipment beyond a standard camera. Our results show that our system is highly accurate and, when compared to a network using manually labeled RGBD data, achieves significantly more consistent and robust performance across environments that differ from the training set.
arXiv Detail & Related papers (2022-02-26T03:38:20Z)
OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence Robotics [124.08684545010664]
Scene graph generation from images is a task of great interest to applications such as robotics. We propose an initial approximation to a framework called Ontology-Guided Scene Graph Generation (OG-SGG)
arXiv Detail & Related papers (2022-02-21T13:23:15Z)
RICE: Refining Instance Masks in Cluttered Environments with Graph Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks. We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations. We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z)
SAPIEN: A SimulAted Part-based Interactive ENvironment [77.4739790629284]
SAPIEN is a realistic and physics-rich simulated environment that hosts a large-scale set for articulated objects. We evaluate state-of-the-art vision algorithms for part detection and motion attribute recognition as well as demonstrate robotic interaction tasks.
arXiv Detail & Related papers (2020-03-19T00:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.