Related papers: Object Instance Identification in Dynamic Environments

Object Instance Identification in Dynamic Environments

URL: http://arxiv.org/abs/2206.05319v1
Date: Fri, 10 Jun 2022 18:38:10 GMT
Title: Object Instance Identification in Dynamic Environments
Authors: Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato
Abstract summary: We study the problem of identifying object instances in a dynamic environment where people interact with the objects. We build a benchmark of more than 1,500 instances built on the EPIC-KITCHENS dataset. Experimental results suggest that (i) robustness against instance-specific appearance change (ii) integration of low-level (e.g., color, texture) and high-level (e.g., object category) features are required for further improvement.
Score: 19.009931116468294
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of identifying object instances in a dynamic environment where people interact with the objects. In such an environment, objects' appearance changes dynamically by interaction with other entities, occlusion by hands, background change, etc. This leads to a larger intra-instance variation of appearance than in static environments. To discover the challenges in this setting, we newly built a benchmark of more than 1,500 instances built on the EPIC-KITCHENS dataset which includes natural activities and conducted an extensive analysis of it. Experimental results suggest that (i) robustness against instance-specific appearance change (ii) integration of low-level (e.g., color, texture) and high-level (e.g., object category) features (iii) foreground feature selection on overlapping objects are required for further improvement.

Related papers

ADUGS-VINS: Generalized Visual-Inertial Odometry for Robust Navigation in Highly Dynamic and Complex Environments [7.07379964916809]
We introduce ADUGS-VINS, which integrates an enhanced SORT algorithm along with a promptable foundation model into VIO. We evaluate our proposed method using multiple public datasets representing various scenes, as well as in a real-world scenario involving diverse dynamic objects.
arXiv Detail & Related papers (2024-11-28T17:41:33Z)
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments [44.6372390798904]
We propose a new task denominated Personalized Instance-based Navigation (PIN), in which an embodied agent is tasked with locating and reaching a specific personal object. In each episode, the target object is presented to the agent using two modalities: a set of visual reference images on a neutral background and manually annotated textual descriptions.
arXiv Detail & Related papers (2024-10-23T18:01:09Z)
DynaVINS++: Robust Visual-Inertial State Estimator in Dynamic Environments by Adaptive Truncated Least Squares and Stable State Recovery [11.37707868611451]
We propose a robust VINS framework called mboxtextitDynaVINS++. Our approach shows promising performance in dynamic environments, including scenes with abruptly dynamic objects.
arXiv Detail & Related papers (2024-10-20T12:13:45Z)
CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning [7.376512548629663]
We introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints. We also propose CLOVER, a representation learning method for object observations that can distinguish between static object instances.
arXiv Detail & Related papers (2024-07-12T23:16:48Z)
1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations. We trained our model on a large-scale video object segmentation dataset. Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z)
DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments [28.23284296418962]
Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object diversity, and scene texts. We propose a dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE) DOZE comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios.
arXiv Detail & Related papers (2024-02-29T10:03:57Z)
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization [61.64304227831361]
Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains. We propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity.
arXiv Detail & Related papers (2024-02-28T16:16:51Z)
OSCaR: Object State Captioning and State Change Representation [52.13461424520107]
This paper introduces the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark. OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections. It sets a new testbed for evaluating multimodal large language models (MLLMs)
arXiv Detail & Related papers (2024-02-27T01:48:19Z)
Tracking through Containers and Occluders in the Wild [32.86030395660071]
We introduce $textbfTCOW$, a new benchmark and model for visual tracking through heavy occlusion and containment. We create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance. We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.
arXiv Detail & Related papers (2023-05-04T17:59:58Z)
Finding Fallen Objects Via Asynchronous Audio-Visual Integration [89.75296559813437]
This paper introduces a setting in which to study multi-modal object localization in 3D virtual environments. An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics. The dataset uses the ThreeDWorld platform which can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting.
arXiv Detail & Related papers (2022-07-07T17:59:59Z)
Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels. Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions. We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z)
Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors. Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf) For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.