Compatibility-aware Heterogeneous Visual Search
- URL: http://arxiv.org/abs/2105.06047v1
- Date: Thu, 13 May 2021 02:30:50 GMT
- Title: Compatibility-aware Heterogeneous Visual Search
- Authors: Rahul Duggal, Hao Zhou, Shuo Yang, Yuanjun Xiong, Wei Xia, Zhuowen Tu,
Stefano Soatto
- Abstract summary: Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images.
We address two forms of compatibility: One enforced by modifying the parameters of each model that computes the embeddings, the other by modifying the architectures that compute the embeddings.
Compared to ordinary (homogeneous) visual search using the largest embedding model (paragon), CMP-NAS achieves 80-fold and 23-fold cost reduction.
- Score: 93.90831195353333
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the problem of visual search under resource constraints. Existing
systems use the same embedding model to compute representations (embeddings)
for the query and gallery images. Such systems inherently face a hard
accuracy-efficiency trade-off: the embedding model needs to be large enough to
ensure high accuracy, yet small enough to enable query-embedding computation on
resource-constrained platforms. This trade-off could be mitigated if gallery
embeddings are generated from a large model and query embeddings are extracted
using a compact model. The key to building such a system is to ensure
representation compatibility between the query and gallery models. In this
paper, we address two forms of compatibility: One enforced by modifying the
parameters of each model that computes the embeddings. The other by modifying
the architectures that compute the embeddings, leading to compatibility-aware
neural architecture search (CMP-NAS). We test CMP-NAS on challenging retrieval
tasks for fashion images (DeepFashion2), and face images (IJB-C). Compared to
ordinary (homogeneous) visual search using the largest embedding model
(paragon), CMP-NAS achieves 80-fold and 23-fold cost reduction while
maintaining accuracy within 0.3% and 1.6% of the paragon on DeepFashion2 and
IJB-C respectively.
Related papers
- Comb, Prune, Distill: Towards Unified Pruning for Vision Model Compression [24.119415458653616]
We propose a novel unified pruning framework Comb, Prune, Distill (CPD) to address both model-agnostic and task-agnostic concerns simultaneously.
Our framework employs a combing step to resolve hierarchical layer-wise dependency issues, enabling architecture independence.
In image classification we achieve a speedup of up to x4.3 with a accuracy loss of 1.8% and in semantic segmentation up to x1.89 with a 5.1% loss in mIoU.
arXiv Detail & Related papers (2024-08-06T09:02:31Z) - Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Asymmetric Image Retrieval with Cross Model Compatible Ensembles [4.86935886318034]
asymmetrical retrieval is a well suited solution for resource constrained applications such as face recognition and image retrieval.
We present an approach that does not rely on knowledge distillation, rather it utilizes embedding transformation models.
We improve the overall accuracy beyond that of any single model while maintaining a low computational budget for querying.
arXiv Detail & Related papers (2023-03-30T16:53:07Z) - FastFill: Efficient Compatible Model Update [40.27741553705222]
FastFill is a compatible model update process using feature alignment and policy based partial backfilling.
We show that previous backfilling strategies suffer from decreased performance and demonstrate the importance of both the training objective and the ordering in online partial backfilling.
arXiv Detail & Related papers (2023-03-08T18:03:51Z) - TINYCD: A (Not So) Deep Learning Model For Change Detection [68.8204255655161]
The aim of change detection (CD) is to detect changes occurred in the same area by comparing two images of that place taken at different times.
Recent developments in the field of deep learning enabled researchers to achieve outstanding performance in this area.
We propose a novel model, called TinyCD, demonstrating to be both lightweight and effective.
arXiv Detail & Related papers (2022-07-26T19:28:48Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - AutoRC: Improving BERT Based Relation Classification Models via
Architecture Search [50.349407334562045]
BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models.
No consensus can be reached on what is the optimal architecture.
We design a comprehensive search space for BERT based RC models and employ neural architecture search (NAS) method to automatically discover the design choices.
arXiv Detail & Related papers (2020-09-22T16:55:49Z) - Self-Supervised GAN Compression [32.21713098893454]
We show that a standard model compression technique, weight pruning, cannot be applied to GANs using existing methods.
We then develop a self-supervised compression technique which uses the trained discriminator to supervise the training of a compressed generator.
We show that this framework has a compelling performance to high degrees of sparsity, can be easily applied to new tasks and models, and enables meaningful comparisons between different pruning granularities.
arXiv Detail & Related papers (2020-07-03T04:18:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.