Efficient Discovery and Effective Evaluation of Visual Perceptual
Similarity: A Benchmark and Beyond
- URL: http://arxiv.org/abs/2308.14753v1
- Date: Mon, 28 Aug 2023 17:59:47 GMT
- Title: Efficient Discovery and Effective Evaluation of Visual Perceptual
Similarity: A Benchmark and Beyond
- Authors: Oren Barkan, Tal Reiss, Jonathan Weill, Ori Katz, Roy Hirsch, Itzik
Malkiel, Noam Koenigstein
- Abstract summary: We introduce the first large-scale fashion visual similarity benchmark dataset, consisting of more than 110K expert-annotated image pairs.
We propose a novel and efficient labeling procedure that can be applied to any dataset.
- Score: 20.035369732786407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual similarities discovery (VSD) is an important task with broad
e-commerce applications. Given an image of a certain object, the goal of VSD is
to retrieve images of different objects with high perceptual visual similarity.
Although being a highly addressed problem, the evaluation of proposed methods
for VSD is often based on a proxy of an identification-retrieval task,
evaluating the ability of a model to retrieve different images of the same
object. We posit that evaluating VSD methods based on identification tasks is
limited, and faithful evaluation must rely on expert annotations. In this
paper, we introduce the first large-scale fashion visual similarity benchmark
dataset, consisting of more than 110K expert-annotated image pairs. Besides
this major contribution, we share insight from the challenges we faced while
curating this dataset. Based on these insights, we propose a novel and
efficient labeling procedure that can be applied to any dataset. Our analysis
examines its limitations and inductive biases, and based on these findings, we
propose metrics to mitigate those limitations. Though our primary focus lies on
visual similarity, the methodologies we present have broader applications for
discovering and evaluating perceptual similarity across various domains.
Related papers
- Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval [85.73149096516543]
We address the choice of viewpoint during sketch creation in Fine-Grained Sketch-Based Image Retrieval (FG-SBIR)
A pilot study highlights the system's struggle when query-sketches differ in viewpoint from target instances.
To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks.
arXiv Detail & Related papers (2024-07-01T21:20:44Z) - Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback [5.770351255180495]
Image Retrieval with Relevance Feedback (IRRF) involves iterative human interaction during the retrieval process.
We propose a new scheme based on a hyper-network, that is tailored to the task and facilitates swift adjustment to user feedback.
We show that our method can attain SoTA results in few-shot one-class classification and reach comparable results in binary classification task of few-shot open-set recognition.
arXiv Detail & Related papers (2023-12-18T10:20:28Z) - Are These the Same Apple? Comparing Images Based on Object Intrinsics [27.43687450076182]
Measure image similarity purely based on intrinsic object properties that define object identity.
This problem has been studied in the computer vision literature as re-identification.
We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics.
arXiv Detail & Related papers (2023-11-01T18:00:03Z) - FORB: A Flat Object Retrieval Benchmark for Universal Image Embedding [7.272083488859574]
We introduce a new dataset for benchmarking visual search methods on flat images with diverse patterns.
Our flat object retrieval benchmark (FORB) supplements the commonly adopted 3D object domain.
It serves as a testbed for assessing the image embedding quality on out-of-distribution domains.
arXiv Detail & Related papers (2023-09-28T08:41:51Z) - Diffusion-based Visual Counterfactual Explanations -- Towards Systematic
Quantitative Evaluation [64.0476282000118]
Latest methods for visual counterfactual explanations (VCE) harness the power of deep generative models to synthesize new examples of high-dimensional images of impressive quality.
It is currently difficult to compare the performance of these VCE methods as the evaluation procedures largely vary and often boil down to visual inspection of individual examples and small scale user studies.
We propose a framework for systematic, quantitative evaluation of the VCE methods and a minimal set of metrics to be used.
arXiv Detail & Related papers (2023-08-11T12:22:37Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image.
We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images.
With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - Enriching ImageNet with Human Similarity Judgments and Psychological
Embeddings [7.6146285961466]
We introduce a dataset that embodies the task-general capabilities of human perception and reasoning.
The Human Similarity Judgments extension to ImageNet (ImageNet-HSJ) is composed of human similarity judgments.
The new dataset supports a range of task and performance metrics, including the evaluation of unsupervised learning algorithms.
arXiv Detail & Related papers (2020-11-22T13:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.