Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
- URL: http://arxiv.org/abs/2405.14715v1
- Date: Thu, 23 May 2024 15:46:35 GMT
- Title: Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
- Authors: Young Kyun Jang, Ser-nam Lim,
- Abstract summary: Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with the old model's embeddings.
This paper extends the concept of vision-only BT to the field of cross-modal retrieval.
We propose a projection module that maps the new model's embeddings to those of the old model.
- Score: 44.56258991182532
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the embeddings for a large number of data samples. In vision, Backward-compatible Training (BT) has been proposed to ensure that the new model aligns with the old model's embeddings. This paper extends the concept of vision-only BT to the field of cross-modal retrieval, marking the first attempt to address Cross-modal BT (XBT). Our goal is to achieve backward-compatibility between Vision-Language Pretraining (VLP) models, such as CLIP, for the cross-modal retrieval task. To address XBT challenges, we propose an efficient solution: a projection module that maps the new model's embeddings to those of the old model. This module, pretrained solely with text data, significantly reduces the number of image-text pairs required for XBT learning, and, once it is pretrained, it avoids using the old model during training. Furthermore, we utilize parameter-efficient training strategies that improve efficiency and preserve the off-the-shelf new model's knowledge by avoiding any modifications. Experimental results on cross-modal retrieval datasets demonstrate the effectiveness of XBT and its potential to enable backfill-free upgrades when a new VLP model emerges.
Related papers
- Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer [20.96380700548786]
Visual retrieval systems face challenges when updating models with improved representations due to misalignment between the old and new representations.
Prior research has explored backward-compatible training methods that enable direct comparisons between new and old representations without backfilling.
In this paper, we address achieving a balance between backward compatibility and the performance of independently trained models.
arXiv Detail & Related papers (2024-08-16T15:05:28Z) - Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters [65.15700861265432]
We present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models.
Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters.
To preserve the zero-shot recognition capability of vision-language models, we introduce a Distribution Discriminative Auto-Selector.
arXiv Detail & Related papers (2024-03-18T08:00:23Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - MixBCT: Towards Self-Adapting Backward-Compatible Training [66.52766344751635]
We propose MixBCT, a simple yet highly effective backward-compatible training method.
We conduct experiments on the large-scale face recognition datasets MS1Mv3 and IJB-C.
arXiv Detail & Related papers (2023-08-14T05:55:38Z) - Boundary-aware Backward-Compatible Representation via Adversarial
Learning in Image Retrieval [17.995993499100017]
Backward-compatible training (BCT) improves the compatibility of two models with less negative impact on retrieval performance.
We introduce AdvBCT, an Adversarial Backward-Training method with an elastic boundary constraint.
Our method outperforms other BCT methods on both compatibility and discrimination.
arXiv Detail & Related papers (2023-05-04T07:37:07Z) - Towards Universal Backward-Compatible Representation Learning [29.77801805854168]
backward-compatible representation learning is introduced to support backfill-free model upgrades.
We first introduce a new problem of universal backward-compatible representation learning, covering all possible data split in model upgrades.
We propose a simple yet effective method, dubbed Universal Backward- Training (UniBCT) with a novel structural prototype refinement algorithm.
arXiv Detail & Related papers (2022-03-03T09:23:51Z) - Forward Compatible Training for Representation Learning [53.300192863727226]
backward compatible training (BCT) modifies training of the new model to make its representations compatible with those of the old model.
BCT can significantly hinder the performance of the new model.
In this work, we propose a new learning paradigm for representation learning: forward compatible training (FCT)
arXiv Detail & Related papers (2021-12-06T06:18:54Z) - Towards Backward-Compatible Representation Learning [86.39292571306395]
We propose a way to learn visual features that are compatible with previously computed ones even when they have different dimensions.
This enables visual search systems to bypass computing new features for all previously seen images when updating the embedding models.
We propose a framework to train embedding models, called backward-compatible training (BCT), as a first step towards backward compatible representation learning.
arXiv Detail & Related papers (2020-03-26T14:34:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.