Continual learning in cross-modal retrieval
- URL: http://arxiv.org/abs/2104.06806v1
- Date: Wed, 14 Apr 2021 12:13:39 GMT
- Title: Continual learning in cross-modal retrieval
- Authors: Kai Wang, Luis Herranz, Joost van de Weijer
- Abstract summary: We study how the interference caused by new tasks impacts the embedding spaces and their cross-modal alignment required for effective retrieval.
We propose a general framework that decouples the training, indexing and querying stages.
We also identify and study different factors that may lead to forgetting, and propose tools to alleviate it.
- Score: 47.73014647702813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal representations and continual learning are two areas closely
related to human intelligence. The former considers the learning of shared
representation spaces where information from different modalities can be
compared and integrated (we focus on cross-modal retrieval between language and
visual representations). The latter studies how to prevent forgetting a
previously learned task when learning a new one. While humans excel in these
two aspects, deep neural networks are still quite limited. In this paper, we
propose a combination of both problems into a continual cross-modal retrieval
setting, where we study how the catastrophic interference caused by new tasks
impacts the embedding spaces and their cross-modal alignment required for
effective retrieval. We propose a general framework that decouples the
training, indexing and querying stages. We also identify and study different
factors that may lead to forgetting, and propose tools to alleviate it. We
found that the indexing stage pays an important role and that simply avoiding
reindexing the database with updated embedding networks can lead to significant
gains. We evaluated our methods in two image-text retrieval datasets, obtaining
significant gains with respect to the fine tuning baseline.
Related papers
- Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs)
Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations.
We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Contrastive Cross-Modal Knowledge Sharing Pre-training for
Vision-Language Representation Learning and Retrieval [12.30468719055037]
A Contrastive Cross-Modal Knowledge Sharing Pre-training (COOKIE) is developed to grasp the joint text-image representations.
The first module is a weight-sharing transformer that builds on the head of the visual and textual encoders.
The other one is three specially designed contrastive learning, aiming to share knowledge between different models.
arXiv Detail & Related papers (2022-07-02T04:08:44Z) - Gap Minimization for Knowledge Sharing and Transfer [24.954256258648982]
In this paper, we introduce the notion of emphperformance gap, an intuitive and novel measure of the distance between learning tasks.
We show that the performance gap can be viewed as a data- and algorithm-dependent regularizer, which controls the model complexity and leads to finer guarantees.
We instantiate this principle with two algorithms: 1. gapBoost, a novel and principled boosting algorithm that explicitly minimizes the performance gap between source and target domains for transfer learning; and 2. gapMTNN, a representation learning algorithm that reformulates gap minimization as semantic conditional matching
arXiv Detail & Related papers (2022-01-26T23:06:20Z) - On the relationship between disentanglement and multi-task learning [62.997667081978825]
We take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing.
We show that disentanglement appears naturally during the process of multi-task neural network training.
arXiv Detail & Related papers (2021-10-07T14:35:34Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Deep Learning Techniques for Future Intelligent Cross-Media Retrieval [58.20547387332133]
Cross-media retrieval plays a significant role in big data applications.
We provide a novel taxonomy according to the challenges faced by multi-modal deep learning approaches.
We present some well-known cross-media datasets used for retrieval.
arXiv Detail & Related papers (2020-07-21T09:49:33Z) - Unsupervised and Interpretable Domain Adaptation to Rapidly Filter
Tweets for Emergency Services [18.57009530004948]
We present a novel method to classify relevant tweets during an ongoing crisis using the publicly available dataset of TREC incident streams.
We use dedicated attention layers for each task to provide model interpretability; critical for real-word applications.
We show a practical implication of our work by providing a use-case for the COVID-19 pandemic.
arXiv Detail & Related papers (2020-03-04T06:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.