Learning State-Invariant Representations of Objects from Image Collections with State, Pose, and Viewpoint Changes
- URL: http://arxiv.org/abs/2404.06470v1
- Date: Tue, 9 Apr 2024 17:17:48 GMT
- Title: Learning State-Invariant Representations of Objects from Image Collections with State, Pose, and Viewpoint Changes
- Authors: Rohan Sarkar, Avinash Kak,
- Abstract summary: We present a novel dataset, ObjectsWithStateChange, that captures state and pose variations in the object images recorded from arbitrary viewpoints.
The goal of such research would be to train models capable of generating object embeddings that remain invariant to state changes.
We propose a curriculum learning strategy that uses the similarity relationships in the learned embedding space after each epoch to guide the training process.
- Score: 0.6577148087211809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We add one more invariance - state invariance - to the more commonly used other invariances for learning object representations for recognition and retrieval. By state invariance, we mean robust with respect to changes in the structural form of the object, such as when an umbrella is folded, or when an item of clothing is tossed on the floor. Since humans generally have no difficulty in recognizing objects despite such state changes, we are naturally faced with the question of whether it is possible to devise a neural architecture with similar abilities. To that end, we present a novel dataset, ObjectsWithStateChange, that captures state and pose variations in the object images recorded from arbitrary viewpoints. We believe that this dataset will facilitate research in fine-grained object recognition and retrieval of objects that are capable of state changes. The goal of such research would be to train models capable of generating object embeddings that remain invariant to state changes while also staying invariant to transformations induced by changes in viewpoint, pose, illumination, etc. To demonstrate the usefulness of the ObjectsWithStateChange dataset, we also propose a curriculum learning strategy that uses the similarity relationships in the learned embedding space after each epoch to guide the training process. The model learns discriminative features by comparing visually similar objects within and across different categories, encouraging it to differentiate between objects that may be challenging to distinguish due to changes in their state. We believe that this strategy enhances the model's ability to capture discriminative features for fine-grained tasks that may involve objects with state changes, leading to performance improvements on object-level tasks not only on our new dataset, but also on two other challenging multi-view datasets such as ModelNet40 and ObjectPI.
Related papers
- ResVG: Enhancing Relation and Semantic Understanding in Multiple Instances for Visual Grounding [42.10086029931937]
Visual grounding aims to localize the object referred to in an image based on a natural language query.
Existing methods demonstrate a significant performance drop when there are multiple distractions in an image.
We propose a novel approach, the Relation and Semantic-sensitive Visual Grounding (ResVG) model, to address this issue.
arXiv Detail & Related papers (2024-08-29T07:32:01Z) - CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning [7.376512548629663]
We introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects of 8 classes under diverse lighting conditions and viewpoints.
We also propose CLOVER, a representation learning method for object observations that can distinguish between static object instances.
arXiv Detail & Related papers (2024-07-12T23:16:48Z) - Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange [50.45953583802282]
We introduce a novel self-supervised learning (SSL) strategy for point cloud scene understanding.
Our approach leverages both object patterns and contextual cues to produce robust features.
Our experiments demonstrate the superiority of our method over existing SSL techniques.
arXiv Detail & Related papers (2024-04-11T06:39:53Z) - ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes [64.57705752579207]
We evaluate the resilience of vision-based models against diverse object-to-background context variations.
We harness the generative capabilities of text-to-image, image-to-text, and image-to-segment models to automatically generate object-to-background changes.
arXiv Detail & Related papers (2024-03-07T17:48:48Z) - OSCaR: Object State Captioning and State Change Representation [52.13461424520107]
This paper introduces the Object State Captioning and State Change Representation (OSCaR) dataset and benchmark.
OSCaR consists of 14,084 annotated video segments with nearly 1,000 unique objects from various egocentric video collections.
It sets a new testbed for evaluating multimodal large language models (MLLMs)
arXiv Detail & Related papers (2024-02-27T01:48:19Z) - Variational Inference for Scalable 3D Object-centric Learning [19.445804699433353]
We tackle the task of scalable unsupervised object-centric representation learning on 3D scenes.
Existing approaches to object-centric representation learning show limitations in generalizing to larger scenes.
We propose to learn view-invariant 3D object representations in localized object coordinate systems.
arXiv Detail & Related papers (2023-09-25T10:23:40Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Matching Multiple Perspectives for Efficient Representation Learning [0.0]
We present an approach that combines self-supervised learning with a multi-perspective matching technique.
We show that the availability of multiple views of the same object combined with a variety of self-supervised pretraining algorithms can lead to improved object classification performance.
arXiv Detail & Related papers (2022-08-16T10:33:13Z) - Siamese Contrastive Embedding Network for Compositional Zero-Shot
Learning [76.13542095170911]
Compositional Zero-Shot Learning (CZSL) aims to recognize unseen compositions formed from seen state and object during training.
We propose a novel Siamese Contrastive Embedding Network (SCEN) for unseen composition recognition.
Our method significantly outperforms the state-of-the-art approaches on three challenging benchmark datasets.
arXiv Detail & Related papers (2022-06-29T09:02:35Z) - SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric
Action Recognition [35.4163266882568]
We introduce Self-Supervised Learning Over Sets (SOS) to pre-train a generic Objects In Contact (OIC) representation model.
Our OIC significantly boosts the performance of multiple state-of-the-art video classification models.
arXiv Detail & Related papers (2022-04-10T23:27:19Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Unsupervised Part Discovery via Feature Alignment [15.67978793872039]
We exploit the property that neural network features are largely invariant to nuisance variables.
We find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps.
During inference, part detection is simple and fast, without any extra modules or overheads other than a feed-forward neural network.
arXiv Detail & Related papers (2020-12-01T07:25:00Z) - Learning visual policies for building 3D shape categories [130.7718618259183]
Previous work in this domain often assembles particular instances of objects from known sets of primitives.
We learn a visual policy to assemble other instances of the same category.
Our visual assembly policies are trained with no real images and reach up to 95% success rate when evaluated on a real robot.
arXiv Detail & Related papers (2020-04-15T17:29:10Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.