Object Instance Identification in Dynamic Environments
- URL: http://arxiv.org/abs/2206.05319v1
- Date: Fri, 10 Jun 2022 18:38:10 GMT
- Title: Object Instance Identification in Dynamic Environments
- Authors: Takuma Yagi, Md Tasnimul Hasan, Yoichi Sato
- Abstract summary: We study the problem of identifying object instances in a dynamic environment where people interact with the objects.
We build a benchmark of more than 1,500 instances built on the EPIC-KITCHENS dataset.
Experimental results suggest that (i) robustness against instance-specific appearance change (ii) integration of low-level (e.g., color, texture) and high-level (e.g., object category) features are required for further improvement.
- Score: 19.009931116468294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of identifying object instances in a dynamic environment
where people interact with the objects. In such an environment, objects'
appearance changes dynamically by interaction with other entities, occlusion by
hands, background change, etc. This leads to a larger intra-instance variation
of appearance than in static environments. To discover the challenges in this
setting, we newly built a benchmark of more than 1,500 instances built on the
EPIC-KITCHENS dataset which includes natural activities and conducted an
extensive analysis of it. Experimental results suggest that (i) robustness
against instance-specific appearance change (ii) integration of low-level
(e.g., color, texture) and high-level (e.g., object category) features (iii)
foreground feature selection on overlapping objects are required for further
improvement.
Related papers
- CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning [7.376512548629663]
We introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints.
We also propose CLOVER, a representation learning method for object observations that can distinguish between static object instances.
arXiv Detail & Related papers (2024-07-12T23:16:48Z) - eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking [9.282504639411163]
This paper proposes a novel and effective Transformer-based event-guided tracking framework, called eMoE-Tracker.
Our key idea is to disentangle the environment into several learnable attributes to dynamically learn the attribute-specific features.
Experiments on diverse event-based benchmark datasets showcase the superior performance of our eMoE-Tracker compared to the prior arts.
arXiv Detail & Related papers (2024-06-28T16:13:55Z) - 1st Place Solution for MOSE Track in CVPR 2024 PVUW Workshop: Complex Video Object Segmentation [72.54357831350762]
We propose a semantic embedding video object segmentation model and use the salient features of objects as query representations.
We trained our model on a large-scale video object segmentation dataset.
Our model achieves first place (textbf84.45%) in the test set of Complex Video Object Challenge.
arXiv Detail & Related papers (2024-06-07T03:13:46Z) - DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments [28.23284296418962]
Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments.
Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object diversity, and scene texts.
We propose a dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments (DOZE)
DOZE comprises ten high-fidelity 3D scenes with over 18k tasks, aiming to mimic complex, dynamic real-world scenarios.
arXiv Detail & Related papers (2024-02-29T10:03:57Z) - Prompt-Driven Dynamic Object-Centric Learning for Single Domain
Generalization [61.64304227831361]
Single-domain generalization aims to learn a model from single source domain data to achieve generalized performance on other unseen target domains.
We propose a dynamic object-centric perception network based on prompt learning, aiming to adapt to the variations in image complexity.
arXiv Detail & Related papers (2024-02-28T16:16:51Z) - OSCaR: Object State Captioning and State Change Representation [52.13461424520107]
This paper introduces the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.
OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections.
It sets a new testbed for evaluating multimodal large language models (MLLMs)
arXiv Detail & Related papers (2024-02-27T01:48:19Z) - Tracking through Containers and Occluders in the Wild [32.86030395660071]
We introduce $textbfTCOW$, a new benchmark and model for visual tracking through heavy occlusion and containment.
We create a mixture of synthetic and annotated real datasets to support both supervised learning and structured evaluation of model performance.
We evaluate two recent transformer-based video models and find that while they can be surprisingly capable of tracking targets under certain settings of task variation, there remains a considerable performance gap before we can claim a tracking model to have acquired a true notion of object permanence.
arXiv Detail & Related papers (2023-05-04T17:59:58Z) - Finding Fallen Objects Via Asynchronous Audio-Visual Integration [89.75296559813437]
This paper introduces a setting in which to study multi-modal object localization in 3D virtual environments.
An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics.
The dataset uses the ThreeDWorld platform which can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting.
arXiv Detail & Related papers (2022-07-07T17:59:59Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Robust Object Detection via Instance-Level Temporal Cycle Confusion [89.1027433760578]
We study the effectiveness of auxiliary self-supervised tasks to improve the out-of-distribution generalization of object detectors.
Inspired by the principle of maximum entropy, we introduce a novel self-supervised task, instance-level temporal cycle confusion (CycConf)
For each object, the task is to find the most different object proposals in the adjacent frame in a video and then cycle back to itself for self-supervision.
arXiv Detail & Related papers (2021-04-16T21:35:08Z) - Visual Object Recognition in Indoor Environments Using Topologically
Persistent Features [2.2344764434954256]
Object recognition in unseen indoor environments remains a challenging problem for visual perception of mobile robots.
We propose the use of topologically persistent features, which rely on the objects' shape information, to address this challenge.
We implement the proposed method on a real-world robot to demonstrate its usefulness.
arXiv Detail & Related papers (2020-10-07T06:04:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.