On Extending Semantic Abstraction for Efficient Search of Hidden Objects
- URL: http://arxiv.org/abs/2512.22220v1
- Date: Mon, 22 Dec 2025 20:25:19 GMT
- Title: On Extending Semantic Abstraction for Efficient Search of Hidden Objects
- Authors: Tasha Pais, Nikhilesh Belulkar,
- Abstract summary: We use this framework for learning 3D localization and completion for the exclusive domain of hidden objects.<n>Our model can accurately identify the complete 3D location of a hidden object on the first try significantly faster than a naive random search.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic Abstraction's key observation is that 2D VLMs' relevancy activations roughly correspond to their confidence of whether and where an object is in the scene. Thus, relevancy maps are treated as "abstract object" representations. We use this framework for learning 3D localization and completion for the exclusive domain of hidden objects, defined as objects that cannot be directly identified by a VLM because they are at least partially occluded. This process of localizing hidden objects is a form of unstructured search that can be performed more efficiently using historical data of where an object is frequently placed. Our model can accurately identify the complete 3D location of a hidden object on the first try significantly faster than a naive random search. These extensions to semantic abstraction hope to provide household robots with the skills necessary to save time and effort when looking for lost objects.
Related papers
- GrabS: Generative Embodied Agent for 3D Object Segmentation without Scene Supervision [7.511342491529451]
We study the hard problem of 3D object segmentation in complex point clouds without requiring human labels of 3D scenes for supervision.<n>By relying on the similarity of pretrained 2D features or external signals such as motion to group 3D points as objects, existing unsupervised methods are usually limited to identifying simple objects like cars or their segmented objects are often inferior due to the lack of objectness in pretrained features.
arXiv Detail & Related papers (2025-04-16T04:13:53Z) - Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection [54.78470057491049]
Occupancy has emerged as a promising alternative for 3D scene perception.<n>We introduce object-centric occupancy as a supplement to object bboxes.<n>We show that our occupancy features significantly enhance the detection results of state-of-the-art 3D object detectors.
arXiv Detail & Related papers (2024-12-06T16:12:38Z) - Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers [65.51132104404051]
We introduce the use of object identifiers and object-centric representations to interact with scenes at the object level.
Our model significantly outperforms existing methods on benchmarks including ScanRefer, Multi3DRefer, Scan2Cap, ScanQA, and SQA3D.
arXiv Detail & Related papers (2023-12-13T14:27:45Z) - Object-Centric Multiple Object Tracking [124.30650395969126]
This paper proposes a video object-centric model for multiple-object tracking pipelines.
It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module.
Benefited from object-centric learning, we only require sparse detection labels for object localization and feature binding.
arXiv Detail & Related papers (2023-09-01T03:34:12Z) - Source-free Depth for Object Pop-out [113.24407776545652]
Modern learning-based methods offer promising depth maps by inference in the wild.
We adapt such depth inference models for object segmentation using the objects' "pop-out" prior in 3D.
Our experiments on eight datasets consistently demonstrate the benefit of our method in terms of both performance and generalizability.
arXiv Detail & Related papers (2022-12-10T21:57:11Z) - 4D Unsupervised Object Discovery [53.561750858325915]
We propose 4D unsupervised object discovery, jointly discovering objects from 4D data -- 3D point clouds and 2D RGB images with temporal information.
We present the first practical approach for this task by proposing a ClusterNet on 3D point clouds, which is jointly optimized with a 2D localization network.
arXiv Detail & Related papers (2022-10-10T16:05:53Z) - Object Priors for Classifying and Localizing Unseen Actions [45.91275361696107]
We propose three spatial object priors, which encode local person and object detectors along with their spatial relations.
On top we introduce three semantic object priors, which extend semantic matching through word embeddings.
A video embedding combines the spatial and semantic object priors.
arXiv Detail & Related papers (2021-04-10T08:56:58Z) - Objects are Different: Flexible Monocular 3D Object Detection [87.82253067302561]
We propose a flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation.
Experiments demonstrate that our method outperforms the state-of-the-art method by relatively 27% for the moderate level and 30% for the hard level in the test set of KITTI benchmark.
arXiv Detail & Related papers (2021-04-06T07:01:28Z) - Learning Object Permanence from Video [46.34427538905761]
This paper introduces the setup of learning Object Permanence from data.
We explain why this learning problem should be dissected into four components, where objects are visible, (2) occluded, (3) contained by another object and (4) carried by a containing object.
We then present a unified deep architecture that learns to predict object location under these four scenarios.
arXiv Detail & Related papers (2020-03-23T18:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.