CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning
- URL: http://arxiv.org/abs/2407.09718v1
- Date: Fri, 12 Jul 2024 23:16:48 GMT
- Title: CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning
- Authors: Dongmyeong Lee, Amanda Adkins, Joydeep Biswas,
- Abstract summary: We introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints.
We also propose CLOVER, a representation learning method for object observations that can distinguish between static object instances.
- Score: 7.376512548629663
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In many applications, robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Most works on object re-identification focus on specific classes; approaches that address general object re-identification require foreground segmentation and have limited consideration of challenges such as occlusions, outdoor scenes, and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes, and can generalize to unseen instances and classes.
Related papers
- Learning Global Object-Centric Representations via Disentangled Slot Attention [38.78205074748021]
This paper introduces a novel object-centric learning method to empower AI systems with human-like capabilities to identify objects across scenes and generate diverse scenes containing specific objects by learning a set of global object-centric representations.
Experimental results substantiate the efficacy of the proposed method, demonstrating remarkable proficiency in global object-centric representation learning, object identification, scene generation with specific objects and scene decomposition.
arXiv Detail & Related papers (2024-10-24T14:57:00Z) - Towards Reflected Object Detection: A Benchmark [5.981658448641905]
This paper introduces a benchmark specifically designed for Reflected Object Detection.
Our Reflected Object Detection dataset (RODD) features a diverse collection of images showcasing reflected objects in various contexts.
RODD encompasses 10 categories and includes 21,059 images of real and reflected objects across different backgrounds.
arXiv Detail & Related papers (2024-07-08T03:16:05Z) - Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange [50.45953583802282]
We introduce a novel self-supervised learning (SSL) strategy for point cloud scene understanding.
Our approach leverages both object patterns and contextual cues to produce robust features.
Our experiments demonstrate the superiority of our method over existing SSL techniques.
arXiv Detail & Related papers (2024-04-11T06:39:53Z) - Learning State-Invariant Representations of Objects from Image Collections with State, Pose, and Viewpoint Changes [0.6577148087211809]
We present a novel dataset, ObjectsWithStateChange, that captures state and pose variations in the object images recorded from arbitrary viewpoints.
The goal of such research would be to train models capable of generating object embeddings that remain invariant to state changes.
We propose a curriculum learning strategy that uses the similarity relationships in the learned embedding space after each epoch to guide the training process.
arXiv Detail & Related papers (2024-04-09T17:17:48Z) - OSCaR: Object State Captioning and State Change Representation [52.13461424520107]
This paper introduces the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.
OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections.
It sets a new testbed for evaluating multimodal large language models (MLLMs)
arXiv Detail & Related papers (2024-02-27T01:48:19Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - Universal Instance Perception as Object Discovery and Retrieval [90.96031157557806]
UNI reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm.
It can flexibly perceive different types of objects by simply changing the input prompts.
UNI shows superior performance on 20 challenging benchmarks from 10 instance-level tasks.
arXiv Detail & Related papers (2023-03-12T14:28:24Z) - Object Instance Identification in Dynamic Environments [19.009931116468294]
We study the problem of identifying object instances in a dynamic environment where people interact with the objects.
We build a benchmark of more than 1,500 instances built on the EPIC-KITCHENS dataset.
Experimental results suggest that (i) robustness against instance-specific appearance change (ii) integration of low-level (e.g., color, texture) and high-level (e.g., object category) features are required for further improvement.
arXiv Detail & Related papers (2022-06-10T18:38:10Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - ObjectFolder: A Dataset of Objects with Implicit Visual, Auditory, and
Tactile Representations [52.226947570070784]
We present Object, a dataset of 100 objects that addresses both challenges with two key innovations.
First, Object encodes the visual, auditory, and tactile sensory data for all objects, enabling a number of multisensory object recognition tasks.
Second, Object employs a uniform, object-centric simulations, and implicit representation for each object's visual textures, tactile readings, and tactile readings, making the dataset flexible to use and easy to share.
arXiv Detail & Related papers (2021-09-16T14:00:59Z) - Object Instance Mining for Weakly Supervised Object Detection [24.021995037282394]
This paper introduces an end-to-end object instance mining (OIM) framework for weakly supervised object detection.
OIM attempts to detect all possible object instances existing in each image by introducing information propagation on the spatial and appearance graphs.
During the iterative learning process, the less discriminative object instances from the same class can be gradually detected and utilized for training.
arXiv Detail & Related papers (2020-02-04T02:11:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.