Are These the Same Apple? Comparing Images Based on Object Intrinsics
- URL: http://arxiv.org/abs/2311.00750v1
- Date: Wed, 1 Nov 2023 18:00:03 GMT
- Title: Are These the Same Apple? Comparing Images Based on Object Intrinsics
- Authors: Klemen Kotar, Stephen Tian, Hong-Xing Yu, Daniel L.K. Yamins, Jiajun
Wu
- Abstract summary: Measure image similarity purely based on intrinsic object properties that define object identity.
This problem has been studied in the computer vision literature as re-identification.
We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics.
- Score: 27.43687450076182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The human visual system can effortlessly recognize an object under different
extrinsic factors such as lighting, object poses, and background, yet current
computer vision systems often struggle with these variations. An important step
to understanding and improving artificial vision systems is to measure image
similarity purely based on intrinsic object properties that define object
identity. This problem has been studied in the computer vision literature as
re-identification, though mostly restricted to specific object categories such
as people and cars. We propose to extend it to general object categories,
exploring an image similarity metric based on object intrinsics. To benchmark
such measurements, we collect the Common paired objects Under differenT
Extrinsics (CUTE) dataset of $18,000$ images of $180$ objects under different
extrinsic factors such as lighting, poses, and imaging conditions. While
existing methods such as LPIPS and CLIP scores do not measure object intrinsics
well, we find that combining deep features learned from contrastive
self-supervised learning with foreground filtering is a simple yet effective
approach to approximating the similarity. We conduct an extensive survey of
pre-trained features and foreground extraction methods to arrive at a strong
baseline that best measures intrinsic object-centric image similarity among
current methods. Finally, we demonstrate that our approach can aid in
downstream applications such as acting as an analog for human subjects and
improving generalizable re-identification. Please see our project website at
https://s-tian.github.io/projects/cute/ for visualizations of the data and
demos of our metric.
Related papers
- Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - Fusing Local Similarities for Retrieval-based 3D Orientation Estimation
of Unseen Objects [70.49392581592089]
We tackle the task of estimating the 3D orientation of previously-unseen objects from monocular images.
We follow a retrieval-based strategy and prevent the network from learning object-specific features.
Our experiments on the LineMOD, LineMOD-Occluded, and T-LESS datasets show that our method yields a significantly better generalization to unseen objects than previous works.
arXiv Detail & Related papers (2022-03-16T08:53:00Z) - Hybrid Optimized Deep Convolution Neural Network based Learning Model
for Object Detection [0.0]
Object identification is one of the most fundamental and difficult issues in computer vision.
In recent years, deep learning-based object detection techniques have grabbed the public's interest.
In this study, a unique deep learning classification technique is used to create an autonomous object detecting system.
The suggested framework has a detection accuracy of 0.9864, which is greater than current techniques.
arXiv Detail & Related papers (2022-03-02T04:39:37Z) - Sim2Real Object-Centric Keypoint Detection and Description [40.58367357980036]
Keypoint detection and description play a central role in computer vision.
We propose the object-centric formulation, which requires further identifying which object each interest point belongs to.
We develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications.
arXiv Detail & Related papers (2022-02-01T15:00:20Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Unknown Object Segmentation from Stereo Images [18.344801596121997]
We propose a novel object instance segmentation approach that does not require any semantic or geometric information of the objects beforehand.
Focusing on the versatility of stereo sensors, we employ a transformer-based architecture that maps directly from the pair of input images to the object instances.
In experiments in several different application domains, we show that our Instance Stereo Transformer (INSTR) algorithm outperforms current state-of-the-art methods that are based on depth maps.
arXiv Detail & Related papers (2021-03-11T17:03:44Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Unsupervised Part Discovery via Feature Alignment [15.67978793872039]
We exploit the property that neural network features are largely invariant to nuisance variables.
We find a set of similar images that show instances of the same object category in the same pose, through an affine alignment of their corresponding feature maps.
During inference, part detection is simple and fast, without any extra modules or overheads other than a feed-forward neural network.
arXiv Detail & Related papers (2020-12-01T07:25:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.