Related papers: AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval

URL: http://arxiv.org/abs/2408.03282v1
Date: Tue, 6 Aug 2024 16:29:51 GMT
Title: AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
Authors: Pavel Suma, Giorgos Kordopatis-Zilos, Ahmet Iscen, Giorgos Tolias,
Abstract summary: This work investigates the problem of instance-level image retrieval re-ranking with the constraint of memory efficiency. The proposed model uses a transformer-based architecture designed to estimate image-to-image similarity. Results on standard benchmarks demonstrate the superiority of our approach over both hand-crafted and learned models.
Score: 14.009257997448634
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work investigates the problem of instance-level image retrieval re-ranking with the constraint of memory efficiency, ultimately aiming to limit memory usage to 1KB per image. Departing from the prevalent focus on performance enhancements, this work prioritizes the crucial trade-off between performance and memory requirements. The proposed model uses a transformer-based architecture designed to estimate image-to-image similarity by capturing interactions within and across images based on their local descriptors. A distinctive property of the model is the capability for asymmetric similarity estimation. Database images are represented with a smaller number of descriptors compared to query images, enabling performance improvements without increasing memory consumption. To ensure adaptability across different applications, a universal model is introduced that adjusts to a varying number of local descriptors during the testing phase. Results on standard benchmarks demonstrate the superiority of our approach over both hand-crafted and learned models. In particular, compared with current state-of-the-art methods that overlook their memory footprint, our approach not only attains superior performance but does so with a significantly reduced memory footprint. The code and pretrained models are publicly available at: https://github.com/pavelsuma/ames

Related papers

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity [19.989551230170584]
This paper introduces the DiffSim method to measure visual similarity in generative models. By aligning features in the attention layers of the denoising U-Net, DiffSim evaluates both appearance and style similarity. We also introduce the Sref and IP benchmarks to evaluate visual similarity at the level of style and instance.
arXiv Detail & Related papers (2024-12-19T07:00:03Z)
Memory Layers at Scale [67.00854080570979]
This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the budget, as well as mixture-of-expert models when matched for both compute and parameters. We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters.
arXiv Detail & Related papers (2024-12-12T23:56:57Z)
Addressing Issues with Working Memory in Video Object Segmentation [37.755852787082254]
Video object segmentation (VOS) models compare incoming unannotated images to a history of image-mask relations. Current state of the art models perform very well on clean video data. Their reliance on a working memory of previous frames leaves room for error. A simple algorithmic change is proposed that can be applied to any existing working memory-based VOS model.
arXiv Detail & Related papers (2024-10-29T18:34:41Z)
Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels. They are not widely adopted by general users due to their substantial storage requirements. We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z)
GeneCIS: A Benchmark for General Conditional Image Similarity [21.96493413291777]
We argue that there are many notions of'similarity' and that models, like humans, should be able to adapt to these dynamically. We propose the GeneCIS benchmark, which measures models' ability to adapt to a range of similarity conditions.
arXiv Detail & Related papers (2023-06-13T17:59:58Z)
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data [68.63453336523318]
We introduce an attention-based memory module, which learns the importance of each retrieved example from the memory. Compared to existing approaches, our method removes the influence of the irrelevant retrieved examples, and retains those that are beneficial to the input query. We show that it achieves state-of-the-art accuracies in ImageNet-LT, Places-LT and Webvision datasets.
arXiv Detail & Related papers (2023-04-11T12:12:05Z)
Asymmetric Image Retrieval with Cross Model Compatible Ensembles [4.86935886318034]
asymmetrical retrieval is a well suited solution for resource constrained applications such as face recognition and image retrieval. We present an approach that does not rely on knowledge distillation, rather it utilizes embedding transformation models. We improve the overall accuracy beyond that of any single model while maintaining a low computational budget for querying.
arXiv Detail & Related papers (2023-03-30T16:53:07Z)
A Model or 603 Exemplars: Towards Memory-Efficient Class-Incremental Learning [56.450090618578]
Class-Incremental Learning (CIL) aims to train a model with limited memory size to meet this requirement. We show that when counting the model size into the total budget and comparing methods with aligned memory size, saving models do not consistently work. We propose a simple yet effective baseline, denoted as MEMO for Memory-efficient Expandable MOdel.
arXiv Detail & Related papers (2022-05-26T08:24:01Z)
Recall@k Surrogate Loss with Large Batches and Similarity Mixup [62.67458021725227]
Direct optimization, by gradient descent, of an evaluation metric is not possible when it is non-differentiable. In this work, a differentiable surrogate loss for the recall is proposed. The proposed method achieves state-of-the-art results in several image retrieval benchmarks.
arXiv Detail & Related papers (2021-08-25T11:09:11Z)
Multiscale Deep Equilibrium Models [162.15362280927476]
We propose a new class of implicit networks, the multiscale deep equilibrium model (MDEQ) An MDEQ directly solves for and backpropagates through the equilibrium points of multiple feature resolutions simultaneously. We illustrate the effectiveness of this approach on two large-scale vision tasks: ImageNet classification and semantic segmentation on high-resolution images from the Cityscapes dataset.
arXiv Detail & Related papers (2020-06-15T18:07:44Z)
Memory-Efficient Incremental Learning Through Feature Adaptation [71.1449769528535]
We introduce an approach for incremental learning that preserves feature descriptors of training images from previously learned classes. Keeping the much lower-dimensional feature embeddings of images reduces the memory footprint significantly. Experimental results show that our method achieves state-of-the-art classification accuracy in incremental learning benchmarks.
arXiv Detail & Related papers (2020-04-01T21:16:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.