Related papers: OBSER: Object-Based Sub-Environment Recognition for Zero-Shot Environmental Inference

OBSER: Object-Based Sub-Environment Recognition for Zero-Shot Environmental Inference

URL: http://arxiv.org/abs/2507.02929v1
Date: Thu, 26 Jun 2025 05:57:06 GMT
Title: OBSER: Object-Based Sub-Environment Recognition for Zero-Shot Environmental Inference
Authors: Won-Seok Choi, Dong-Sig Han, Suhyung Choi, Hyeonseo Yang, Byoung-Tak Zhang,
Abstract summary: We present the Object-Based Sub-Environment Recognition (OBSER) framework, a novel Bayesian framework that infers three fundamental relationships between sub-environments and their constituent objects.<n>We validate the proposed framework by introducing the ($epsilon,delta$) statistically separable (EDS) function which indicates the alignment of the representation.
Score: 18.514809279438914
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present the Object-Based Sub-Environment Recognition (OBSER) framework, a novel Bayesian framework that infers three fundamental relationships between sub-environments and their constituent objects. In the OBSER framework, metric and self-supervised learning models estimate the object distributions of sub-environments on the latent space to compute these measures. Both theoretically and empirically, we validate the proposed framework by introducing the ($\epsilon,\delta$) statistically separable (EDS) function which indicates the alignment of the representation. Our framework reliably performs inference in open-world and photorealistic environments and outperforms scene-based methods in chained retrieval tasks. The OBSER framework enables zero-shot recognition of environments to achieve autonomous environment understanding.

Related papers

Collaborative Perceiver: Elevating Vision-based 3D Object Detection via Local Density-Aware Spatial Occupancy [7.570294108494611]
Vision-based bird's-eye-view (BEV) 3D object detection has advanced significantly in autonomous driving.<n>Existing methods often construct 3D BEV representations by collapsing extracted object features.<n>We introduce a multi-task learning framework, Collaborative Perceiver, to bridge gaps in spatial representations and feature refinement.
arXiv Detail & Related papers (2025-07-28T21:56:43Z)
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World? [52.99661576320663]
multimodal large language models (MLLMs) have driven breakthroughs in egocentric vision applications.<n>EOC-Bench is an innovative benchmark designed to systematically evaluate object-centric embodied cognition in dynamic egocentric scenarios.<n>We conduct comprehensive evaluations of various proprietary, open-source, and object-level MLLMs based on EOC-Bench.
arXiv Detail & Related papers (2025-06-05T17:44:12Z)
BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation [58.14071520415005]
This paper presents a general RGB-based approach for object pose estimation, specifically designed to address challenges in sparse-view settings.<n>To overcome these limitations, we introduce corner points of the object bounding box as an intermediate representation of the object pose.<n>The 3D object corners can be reliably recovered from sparse input views, while the 2D corner points in the target view are estimated through a novel reference-based point datasets.
arXiv Detail & Related papers (2025-04-10T17:58:35Z)
Can foundation models actively gather information in interactive environments to test hypotheses? [56.651636971591536]
We introduce a framework in which a model must determine the factors influencing a hidden reward function.<n>We investigate whether approaches such as self- throughput and increased inference time improve information gathering efficiency.
arXiv Detail & Related papers (2024-12-09T12:27:21Z)
Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters.<n>We learn and analyse IMDPs for a set of unknown sample environments induced by parameters.<n>We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z)
Analysis of the Memorization and Generalization Capabilities of AI Agents: Are Continual Learners Robust? [91.682459306359]
In continual learning (CL), an AI agent learns from non-stationary data streams under dynamic environments. In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge. The generalization and memorization performance of the proposed framework are theoretically analyzed.
arXiv Detail & Related papers (2023-09-18T21:00:01Z)
Learning Environment-Aware Affordance for 3D Articulated Object Manipulation under Occlusions [9.400505355134728]
We propose an environment-aware affordance framework that incorporates both object-level actionable priors and environment constraints. We introduce a novel contrastive affordance learning framework capable of training on scenes containing a single occluder and generalizing to scenes with complex occluder combinations.
arXiv Detail & Related papers (2023-09-14T08:24:32Z)
Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner. We design a semantic-guided self-supervised learning model to extract high-level semantic features from images. We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z)
Persistent Homology Meets Object Unity: Object Recognition in Clutter [2.356908851188234]
Recognition of occluded objects in unseen and unstructured indoor environments is a challenging problem for mobile robots. We propose a new descriptor, TOPS, for point clouds generated from depth images and an accompanying recognition framework, THOR, inspired by human reasoning. THOR outperforms state-of-the-art methods on both the datasets and achieves substantially higher recognition accuracy for all the scenarios of the UW-IS Occluded dataset.
arXiv Detail & Related papers (2023-05-05T19:42:39Z)
CPPF++: Uncertainty-Aware Sim2Real Object Pose Estimation by Vote Aggregation [67.12857074801731]
We introduce a novel method, CPPF++, designed for sim-to-real pose estimation. To address the challenge posed by vote collision, we propose a novel approach that involves modeling the voting uncertainty. We incorporate several innovative modules, including noisy pair filtering, online alignment optimization, and a feature ensemble.
arXiv Detail & Related papers (2022-11-24T03:27:00Z)
Action-Sufficient State Representation Learning for Control with Structural Constraints [21.47086290736692]
In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making. We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs. Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning.
arXiv Detail & Related papers (2021-10-12T03:16:26Z)
Inter-class Discrepancy Alignment for Face Recognition [55.578063356210144]
We propose a unified framework calledInter-class DiscrepancyAlignment(IDA) IDA-DAO is used to align the similarity scores considering the discrepancy between the images and its neighbors. IDA-SSE can provide convincing inter-class neighbors by introducing virtual candidate images generated with GAN.
arXiv Detail & Related papers (2021-03-02T08:20:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.