Reducing Predictive Feature Suppression in Resource-Constrained
Contrastive Image-Caption Retrieval
- URL: http://arxiv.org/abs/2204.13382v3
- Date: Wed, 7 Jun 2023 09:46:10 GMT
- Title: Reducing Predictive Feature Suppression in Resource-Constrained
Contrastive Image-Caption Retrieval
- Authors: Maurits Bleeker, Andrew Yates, Maarten de Rijke
- Abstract summary: We introduce an approach to reduce predictive feature suppression for resource-constrained ICR methods: latent target decoding (LTD)
LTD reconstructs the input caption in a latent space of a general-purpose sentence encoder, which prevents the image and caption encoder from suppressing predictive features.
Our experiments show that, unlike reconstructing the input caption in the input space, LTD reduces predictive feature suppression, measured by obtaining higher recall@k, r-precision, and nDCG scores.
- Score: 65.33981533521207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To train image-caption retrieval (ICR) methods, contrastive loss functions
are a common choice for optimization functions. Unfortunately, contrastive ICR
methods are vulnerable to predictive feature suppression. Predictive features
are features that correctly indicate the similarity between a query and a
candidate item. However, in the presence of multiple predictive features during
training, encoder models tend to suppress redundant predictive features, since
these features are not needed to learn to discriminate between positive and
negative pairs. While some predictive features are redundant during training,
these features might be relevant during evaluation. We introduce an approach to
reduce predictive feature suppression for resource-constrained ICR methods:
latent target decoding (LTD). We add an additional decoder to the contrastive
ICR framework, to reconstruct the input caption in a latent space of a
general-purpose sentence encoder, which prevents the image and caption encoder
from suppressing predictive features. We implement the LTD objective as an
optimization constraint, to ensure that the reconstruction loss is below a
bound value while primarily optimizing for the contrastive loss. Importantly,
LTD does not depend on additional training data or expensive (hard) negative
mining strategies. Our experiments show that, unlike reconstructing the input
caption in the input space, LTD reduces predictive feature suppression,
measured by obtaining higher recall@k, r-precision, and nDCG scores than a
contrastive ICR baseline. Moreover, we show that LTD should be implemented as
an optimization constraint instead of a dual optimization objective. Finally,
we show that LTD can be used with different contrastive learning losses and a
wide variety of resource-constrained ICR methods.
Related papers
- EnsLoss: Stochastic Calibrated Loss Ensembles for Preventing Overfitting in Classification [1.3778851745408134]
We propose a novel ensemble method, namely EnsLoss, to combine loss functions within the Empirical risk minimization framework.
We first transform the CC conditions of losses into loss-derivatives, thereby bypassing the need for explicit loss functions.
We theoretically establish the statistical consistency of our approach and provide insights into its benefits.
arXiv Detail & Related papers (2024-09-02T02:40:42Z) - Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Distortion-Disentangled Contrastive Learning [13.27998440853596]
We propose a novel POCL framework named Distortion-Disentangled Contrastive Learning (DDCL) and a Distortion-Disentangled Loss (DDL)
Our approach is the first to explicitly disentangle and exploit the DVR inside the model and feature stream to improve the overall representation utilization efficiency, robustness and representation ability.
arXiv Detail & Related papers (2023-03-09T06:33:31Z) - Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.
We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally.
We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z) - Efficient and Differentiable Conformal Prediction with General Function
Classes [96.74055810115456]
We propose a generalization of conformal prediction to multiple learnable parameters.
We show that it achieves approximate valid population coverage and near-optimal efficiency within class.
Experiments show that our algorithm is able to learn valid prediction sets and improve the efficiency significantly.
arXiv Detail & Related papers (2022-02-22T18:37:23Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Efficient Deep Feature Calibration for Cross-Modal Joint Embedding
Learning [14.070841236184439]
This paper introduces a two-phase deep feature calibration framework for efficient learning of semantics enhanced text-image cross-modal joint embedding.
In preprocessing, we perform deep feature calibration by combining deep feature engineering with semantic context features derived from raw text-image input data.
In joint embedding learning, we perform deep feature calibration by optimizing the batch-hard triplet loss function with soft-margin and double negative sampling.
arXiv Detail & Related papers (2021-08-02T08:16:58Z) - Visual Alignment Constraint for Continuous Sign Language Recognition [74.26707067455837]
Vision-based Continuous Sign Language Recognition aims to recognize unsegmented gestures from image sequences.
In this work, we revisit the overfitting problem in recent CTC-based CSLR works and attribute it to the insufficient training of the feature extractor.
We propose a Visual Alignment Constraint (VAC) to enhance the feature extractor with more alignment supervision.
arXiv Detail & Related papers (2021-04-06T07:24:58Z) - Sparse Perturbations for Improved Convergence in Stochastic Zeroth-Order
Optimization [10.907491258280608]
Interest in zeroth-order (SZO) methods has recently been revived in black-box optimization scenarios such as adversarial black-box attacks to deep neural networks.
SZO methods only require the ability to evaluate the objective function at random input points.
We present a SZO optimization method that reduces the dependency on the dimensionality of the random perturbation to be evaluated.
arXiv Detail & Related papers (2020-06-02T16:39:37Z) - Pairwise Supervised Hashing with Bernoulli Variational Auto-Encoder and
Self-Control Gradient Estimator [62.26981903551382]
Variational auto-encoders (VAEs) with binary latent variables provide state-of-the-art performance in terms of precision for document retrieval.
We propose a pairwise loss function with discrete latent VAE to reward within-class similarity and between-class dissimilarity for supervised hashing.
This new semantic hashing framework achieves superior performance compared to the state-of-the-arts.
arXiv Detail & Related papers (2020-05-21T06:11:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.