Compact Deep Aggregation for Set Retrieval
- URL: http://arxiv.org/abs/2003.11794v1
- Date: Thu, 26 Mar 2020 08:43:15 GMT
- Title: Compact Deep Aggregation for Set Retrieval
- Authors: Yujie Zhong, Relja Arandjelovi\'c, Andrew Zisserman
- Abstract summary: We focus on retrieving images containing multiple faces from a large scale dataset of images.
Here the set consists of the face descriptors in each image, and given a query for multiple identities, the goal is then to retrieve, in order, images which contain all the identities.
We show that this compact descriptor has minimal loss of discriminability up to two faces per image, and degrades slowly after that.
- Score: 87.52470995031997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The objective of this work is to learn a compact embedding of a set of
descriptors that is suitable for efficient retrieval and ranking, whilst
maintaining discriminability of the individual descriptors. We focus on a
specific example of this general problem -- that of retrieving images
containing multiple faces from a large scale dataset of images. Here the set
consists of the face descriptors in each image, and given a query for multiple
identities, the goal is then to retrieve, in order, images which contain all
the identities, all but one, \etc
To this end, we make the following contributions: first, we propose a CNN
architecture -- {\em SetNet} -- to achieve the objective: it learns face
descriptors and their aggregation over a set to produce a compact fixed length
descriptor designed for set retrieval, and the score of an image is a count of
the number of identities that match the query; second, we show that this
compact descriptor has minimal loss of discriminability up to two faces per
image, and degrades slowly after that -- far exceeding a number of baselines;
third, we explore the speed vs.\ retrieval quality trade-off for set retrieval
using this compact descriptor; and, finally, we collect and annotate a large
dataset of images containing various number of celebrities, which we use for
evaluation and is publicly released.
Related papers
- Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback [5.770351255180495]
Image Retrieval with Relevance Feedback (IRRF) involves iterative human interaction during the retrieval process.
We propose a new scheme based on a hyper-network, that is tailored to the task and facilitates swift adjustment to user feedback.
We show that our method can attain SoTA results in few-shot one-class classification and reach comparable results in binary classification task of few-shot open-set recognition.
arXiv Detail & Related papers (2023-12-18T10:20:28Z) - RAFIC: Retrieval-Augmented Few-shot Image Classification [0.0]
Few-shot image classification is the task of classifying unseen images to one of N mutually exclusive classes.
We have developed a method for augmenting the set of K with an addition set of A retrieved images.
We demonstrate that RAFIC markedly improves performance of few-shot image classification across two challenging datasets.
arXiv Detail & Related papers (2023-12-11T22:28:51Z) - Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features [12.14013374452918]
We present a simple yet effective approach to object-centric open-vocabulary image retrieval.
Our approach aggregates dense embeddings extracted from CLIP into a compact representation.
We show the effectiveness of our scheme to the task by achieving significantly better results than global feature approaches on three datasets.
arXiv Detail & Related papers (2023-09-26T15:13:09Z) - Collaborative Group: Composed Image Retrieval via Consensus Learning from Noisy Annotations [67.92679668612858]
We propose the Consensus Network (Css-Net), inspired by the psychological concept that groups outperform individuals.
Css-Net comprises two core components: (1) a consensus module with four diverse compositors, each generating distinct image-text embeddings; and (2) a Kullback-Leibler divergence loss that encourages learning of inter-compositor interactions.
On benchmark datasets, particularly FashionIQ, Css-Net demonstrates marked improvements. Notably, it achieves significant recall gains, with a 2.77% increase in R@10 and 6.67% boost in R@50, underscoring its
arXiv Detail & Related papers (2023-06-03T11:50:44Z) - Self-supervised Multi-view Disentanglement for Expansion of Visual
Collections [6.944742823561]
We consider the setting where a query for similar images is derived from a collection of images.
For visual search, the similarity measurements may be made along multiple axes, or views, such as style and color.
Our objective is to design a retrieval algorithm that effectively combines similarities computed over representations from multiple views.
arXiv Detail & Related papers (2023-02-04T22:09:17Z) - Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches.
We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z) - SCNet: Enhancing Few-Shot Semantic Segmentation by Self-Contrastive
Background Prototypes [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a query image with only a few annotated examples.
Most of advanced solutions exploit a metric learning framework that performs segmentation through matching each pixel to a learned foreground prototype.
This framework suffers from biased classification due to incomplete construction of sample pairs with the foreground prototype only.
arXiv Detail & Related papers (2021-04-19T11:21:47Z) - Dense Relational Image Captioning via Multi-task Triple-Stream Networks [95.0476489266988]
We introduce dense captioning, a novel task which aims to generate captions with respect to information between objects in a visual scene.
This framework is advantageous in both diversity and amount of information, leading to a comprehensive image understanding.
arXiv Detail & Related papers (2020-10-08T09:17:55Z) - Tasks Integrated Networks: Joint Detection and Retrieval for Image
Search [99.49021025124405]
In many real-world searching scenarios (e.g., video surveillance), the objects are seldom accurately detected or annotated.
We first introduce an end-to-end Integrated Net (I-Net), which has three merits.
We further propose an improved I-Net, called DC-I-Net, which makes two new contributions.
arXiv Detail & Related papers (2020-09-03T03:57:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.