ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data
- URL: http://arxiv.org/abs/2601.15025v1
- Date: Wed, 21 Jan 2026 14:27:38 GMT
- Title: ExPrIS: Knowledge-Level Expectations as Priors for Object Interpretation from Sensor Data
- Authors: Marian Renz, Martin Günther, Felix Igelbrink, Oscar Lima, Martin Atzmueller,
- Abstract summary: ExPrIS project investigates how knowledge-level expectations can serve as to improve object interpretation from sensor data.<n>We integrate expectations from two sources: contextual priors from past observations and semantic knowledge from external graphs like ConceptNet.<n>This method moves beyond static, frame-by-frame analysis to enhance the robustness and consistency of scene understanding over time.
- Score: 1.0801606421449652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While deep learning has significantly advanced robotic object recognition, purely data-driven approaches often lack semantic consistency and fail to leverage valuable, pre-existing knowledge about the environment. This report presents the ExPrIS project, which addresses this challenge by investigating how knowledge-level expectations can serve as to improve object interpretation from sensor data. Our approach is based on the incremental construction of a 3D Semantic Scene Graph (3DSSG). We integrate expectations from two sources: contextual priors from past observations and semantic knowledge from external graphs like ConceptNet. These are embedded into a heterogeneous Graph Neural Network (GNN) to create an expectation-biased inference process. This method moves beyond static, frame-by-frame analysis to enhance the robustness and consistency of scene understanding over time. The report details this architecture, its evaluation, and outlines its planned integration on a mobile robotic platform.
Related papers
- Online Segment Any 3D Thing as Instance Tracking [60.20416622842975]
We reconceptualize online 3D segmentation as an instance tracking problem (AutoSeg3D)<n>We introduce spatial consistency learning to mitigate the fragmentation problem inherent in Vision Foundation Models.<n>Our method establishes a new state-of-the-art, surpassing ESAM by 2.8 AP on ScanNet200.
arXiv Detail & Related papers (2025-12-08T14:48:51Z) - Object-Centric Representation Learning for Enhanced 3D Scene Graph Prediction [3.7471945679132594]
3D Semantic Scene Graph Prediction aims to detect objects and their semantic relationships in 3D scenes.<n>Previous research has addressed dataset limitations and explored various approaches including Open-Vocabulary settings.<n>We demonstrate through extensive analysis that the quality of object features plays a critical role in determining overall scene graph accuracy.
arXiv Detail & Related papers (2025-10-06T11:33:09Z) - IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering [7.247417417159471]
Vision-language models (VLMs) excel at descriptive tasks, but whether they truly understand scenes from visual observations remains uncertain.<n>We introduce IR3D-Bench, a benchmark challenging VLMs to demonstrate understanding through active creation rather than passive recognition.
arXiv Detail & Related papers (2025-06-29T17:02:57Z) - IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction.<n>We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images.<n>We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z) - Are We Ready for Real-Time LiDAR Semantic Segmentation in Autonomous Driving? [42.348499880894686]
Scene semantic segmentation can be achieved by directly integrating 3D spatial data with specialized deep neural networks.
We investigate various 3D semantic segmentation methodologies and analyze their performance and capabilities for resource-constrained inference on embedded NVIDIA Jetson platforms.
arXiv Detail & Related papers (2024-10-10T20:47:33Z) - FusionSense: Bridging Common Sense, Vision, and Touch for Robust Sparse-View Reconstruction [17.367277970910813]
Humans effortlessly integrate common-sense knowledge with sensory input from vision and touch to understand their surroundings.
We introduce FusionSense, a novel 3D reconstruction framework that enables robots to fuse priors from foundation models with highly sparse observations from vision and tactile sensors.
arXiv Detail & Related papers (2024-10-10T18:07:07Z) - On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey [82.49623756124357]
Zero-shot image recognition (ZSIR) aims to recognize and reason in unseen domains by learning generalized knowledge from limited data.<n>This paper thoroughly investigates recent advances in element-wise ZSIR and provides a basis for its future development.
arXiv Detail & Related papers (2024-08-09T05:49:21Z) - ScanERU: Interactive 3D Visual Grounding based on Embodied Reference
Understanding [67.21613160846299]
Embodied Reference Understanding (ERU) is first designed for this concern.
New dataset called ScanERU is constructed to evaluate the effectiveness of this idea.
arXiv Detail & Related papers (2023-03-23T11:36:14Z) - A Threefold Review on Deep Semantic Segmentation: Efficiency-oriented,
Temporal and Depth-aware design [77.34726150561087]
We conduct a survey on the most relevant and recent advances in Deep Semantic in the context of vision for autonomous vehicles.
Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective.
arXiv Detail & Related papers (2023-03-08T01:29:55Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.