UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene
- URL: http://arxiv.org/abs/2510.06754v1
- Date: Wed, 08 Oct 2025 08:30:26 GMT
- Title: UniFField: A Generalizable Unified Neural Feature Field for Visual, Semantic, and Spatial Uncertainties in Any Scene
- Authors: Christian Maurer, Snehal Jauhri, Sophie Lueth, Georgia Chalvatzaki,
- Abstract summary: We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation.<n>We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and semantic feature prediction.
- Score: 11.224584333257338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Comprehensive visual, geometric, and semantic understanding of a 3D scene is crucial for successful execution of robotic tasks, especially in unstructured and complex environments. Additionally, to make robust decisions, it is necessary for the robot to evaluate the reliability of perceived information. While recent advances in 3D neural feature fields have enabled robots to leverage features from pretrained foundation models for tasks such as language-guided manipulation and navigation, existing methods suffer from two critical limitations: (i) they are typically scene-specific, and (ii) they lack the ability to model uncertainty in their predictions. We present UniFField, a unified uncertainty-aware neural feature field that combines visual, semantic, and geometric features in a single generalizable representation while also predicting uncertainty in each modality. Our approach, which can be applied zero shot to any new environment, incrementally integrates RGB-D images into our voxel-based feature representation as the robot explores the scene, simultaneously updating uncertainty estimation. We evaluate our uncertainty estimations to accurately describe the model prediction errors in scene reconstruction and semantic feature prediction. Furthermore, we successfully leverage our feature predictions and their respective uncertainty for an active object search task using a mobile manipulator robot, demonstrating the capability for robust decision-making.
Related papers
- Differentiable Inverse Graphics for Zero-shot Scene Reconstruction and Robot Grasping [0.820984376071696]
We introduce a differentiable neuro-graphics model that combines neural foundation models with physics-based differentiable rendering to perform zero-shot scene reconstruction and robot grasping.<n>Our approach offers a pathway towards more data efficient, interpretable and generalizable robot autonomy in novel environments.
arXiv Detail & Related papers (2026-02-04T20:33:50Z) - Generalizable Geometric Prior and Recurrent Spiking Feature Learning for Humanoid Robot Manipulation [90.90219129619344]
This paper presents a novel R-prior-S, Recurrent Geometric-priormodal Policy with Spiking features.<n>To ground high-level reasoning in physical reality, we leverage lightweight 2D geometric inductive biases.<n>For the data efficiency issue in robotic action generation, we introduce a Recursive Adaptive Spiking Network.
arXiv Detail & Related papers (2026-01-13T23:36:30Z) - Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes [71.2827490406779]
We propose NPISeg3D, a novel probabilistic framework that builds upon Neural Processes (NPs) to address these challenges.<n>NPISeg3D introduces a hierarchical latent variable structure with scene-specific and object-specific latent variables to enhance few-shot generalization.<n>We design a prototype modulator that adaptively modulates click prototypes with object-specific latent variables, improving the model's ability to capture object-aware context.
arXiv Detail & Related papers (2025-05-03T07:43:23Z) - Uncertainty Estimation for 3D Object Detection via Evidential Learning [63.61283174146648]
We introduce a framework for quantifying uncertainty in 3D object detection by leveraging an evidential learning loss on Bird's Eye View representations in the 3D detector.
We demonstrate both the efficacy and importance of these uncertainty estimates on identifying out-of-distribution scenes, poorly localized objects, and missing (false negative) detections.
arXiv Detail & Related papers (2024-10-31T13:13:32Z) - Distributional Instance Segmentation: Modeling Uncertainty and High
Confidence Predictions with Latent-MaskRCNN [77.0623472106488]
In this paper, we explore a class of distributional instance segmentation models using latent codes.
For robotic picking applications, we propose a confidence mask method to achieve the high precision necessary.
We show that our method can significantly reduce critical errors in robotic systems, including our newly released dataset of ambiguous scenes.
arXiv Detail & Related papers (2023-05-03T05:57:29Z) - Autoregressive Uncertainty Modeling for 3D Bounding Box Prediction [63.3021778885906]
3D bounding boxes are a widespread intermediate representation in many computer vision applications.
We propose methods for leveraging our autoregressive model to make high confidence predictions and meaningful uncertainty measures.
We release a simulated dataset, COB-3D, which highlights new types of ambiguity that arise in real-world robotics applications.
arXiv Detail & Related papers (2022-10-13T23:57:40Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - Maintaining a Reliable World Model using Action-aware Perceptual
Anchoring [4.971403153199917]
There is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible.
This requires anchoring perceptual information onto symbols that represent the objects in the environment.
We present a model for action-aware perceptual anchoring that enables robots to track objects in a persistent manner.
arXiv Detail & Related papers (2021-07-07T06:35:14Z) - Few-Shot Visual Grounding for Natural Human-Robot Interaction [0.0]
We propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user.
At the core of our system, we employ a multi-modal deep neural network for visual grounding.
We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets.
arXiv Detail & Related papers (2021-03-17T15:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.