Related papers: FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining

URL: http://arxiv.org/abs/2505.11192v3
Date: Tue, 20 May 2025 03:33:43 GMT
Title: FALCON: False-Negative Aware Learning of Contrastive Negatives in Vision-Language Pretraining
Authors: Myunsoo Kim, Seong-Woong Shim, Byung-Jun Lee,
Abstract summary: We propose FALCON, a learning-based mini-batch construction strategy that balances the trade-off between hard and false negatives.<n>FALCON employs a negative mining scheduler that dynamically selects negative samples of appropriate hardness for each anchor instance during mini-batch construction.
Score: 5.200545764106177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: False negatives pose a critical challenge in vision-language pretraining (VLP) due to the many-to-many correspondence between images and texts in large-scale datasets. These false negatives introduce conflicting supervision signals that degrade the learned embedding space and diminish the effectiveness of hard negative sampling. In this paper, we propose FALCON (False-negative Aware Learning of COntrastive Negatives), a learning-based mini-batch construction strategy that adaptively balances the trade-off between hard and false negatives during VLP. Rather than relying on fixed heuristics, FALCON employs a negative mining scheduler that dynamically selects negative samples of appropriate hardness for each anchor instance during mini-batch construction, guided by a proxy for cross-modal alignment improvement. Experimental results demonstrate that FALCON significantly improves performance across two widely adopted VLP frameworks (ALBEF, BLIP-2) and a broad range of downstream tasks and evaluation settings, underscoring its effectiveness and robustness in mitigating the impact of false negatives.

Related papers

Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models [9.682523487279976]
Vision-Language Models (VLMs) are essential for multimodal tasks, especially compositional reasoning (CR) tasks.<n>Existing methods primarily fine-tune the model by generating text-based hard negative samples.<n>AHNPL translates text-based hard negatives into the visual domain to generate semantically disturbed image-based negatives for training the model.
arXiv Detail & Related papers (2025-05-21T14:28:43Z)
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models [46.64419395105025]
We present a Contrastive-training Perturbation Generator with Cross-modal conditions (C-PGC) to achieve the attack.<n>C-PGC incorporates both unimodal and cross-modal information as effective guidance.<n>Experiments show that C-PGC successfully forces adversarial samples to move away from their original area.
arXiv Detail & Related papers (2024-06-08T15:01:54Z)
MAFA: Managing False Negatives for Vision-Language Pre-training [17.836155361629718]
We consider a critical issue of false negatives in Vision-Language Pre-training. The presence of false negatives can impede achieving optimal performance and even lead to a significant performance drop. We propose MAFA (MAnaging FAlse negatives), which consists of two pivotal components building upon the recently developed GRouped mIni-baTch sampling (GRIT) strategy.
arXiv Detail & Related papers (2023-12-11T04:33:35Z)
Improving Contrastive Learning of Sentence Embeddings with Focal-InfoNCE [13.494159547236425]
This study introduces an unsupervised contrastive learning framework that combines SimCSE with hard negative mining. The proposed focal-InfoNCE function introduces self-paced modulation terms in the contrastive objective, downweighting the loss associated with easy negatives and encouraging the model focusing on hard negatives.
arXiv Detail & Related papers (2023-10-10T18:15:24Z)
Language Model Pre-training on True Negatives [109.73819321246062]
Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones. Existing PLMs simply treat all corrupted texts as equal negative without any examination. We design enhanced pre-training methods to counteract false negative predictions and encourage pre-training language models on true negatives.
arXiv Detail & Related papers (2022-12-01T12:24:19Z)
A Practical Contrastive Learning Framework for Single-Image Super-Resolution [51.422185656787285]
We investigate contrastive learning-based single image super-resolution from two perspectives. We propose a practical contrastive learning framework for SISR, named PCL-SR. Compared with existing benchmark methods, we re-train them by our proposed PCL-SR framework and achieve superior performance.
arXiv Detail & Related papers (2021-11-27T15:42:12Z)
Incremental False Negative Detection for Contrastive Learning [95.68120675114878]
We introduce a novel incremental false negative detection for self-supervised contrastive learning. During contrastive learning, we discuss two strategies to explicitly remove the detected false negatives. Our proposed method outperforms other self-supervised contrastive learning frameworks on multiple benchmarks within a limited compute.
arXiv Detail & Related papers (2021-06-07T15:29:14Z)
Contrastive Attraction and Contrastive Repulsion for Representation Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples. Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet. We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z)
Towards Overcoming False Positives in Visual Relationship Detection [95.15011997876606]
We investigate the cause of the high false positive rate in Visual Relationship Detection (VRD) This paper presents Spatially-Aware Balanced negative pRoposal sAmpling (SABRA) as a robust VRD framework that alleviates the influence of false positives.
arXiv Detail & Related papers (2020-12-23T06:28:00Z)
NPCFace: Negative-Positive Collaborative Training for Large-scale Face Recognition [78.21084529159577]
We study how to make better use of hard samples for improving the training. The correlation between hard positive and hard negative is overlooked, and so is the relation between the margins in positive and negative logits. We propose a novel Negative-Positive Collaboration loss, named NPCFace, which emphasizes the training on both negative and positive hard cases.
arXiv Detail & Related papers (2020-07-20T14:52:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.