Related papers: Conan-embedding: General Text Embedding with More and Better Negative Samples

Conan-embedding: General Text Embedding with More and Better Negative Samples

URL: http://arxiv.org/abs/2408.15710v2
Date: Thu, 29 Aug 2024 14:47:37 GMT
Title: Conan-embedding: General Text Embedding with More and Better Negative Samples
Authors: Shiyu Li, Yang Tang, Shizhe Chen, Xi Chen,
Abstract summary: We propose a conan-embedding model, which maximizes the utilization of more and higher-quality negative examples. Our approach effectively enhances the capabilities of embedding models, currently ranking first on the Chinese leaderboard of Massive text embedding benchmark.
Score: 30.571206231457932
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: With the growing popularity of RAG, the capabilities of embedding models are gaining increasing attention. Embedding models are primarily trained through contrastive loss learning, with negative examples being a key component. Previous work has proposed various hard negative mining strategies, but these strategies are typically employed as preprocessing steps. In this paper, we propose the conan-embedding model, which maximizes the utilization of more and higher-quality negative examples. Specifically, since the model's ability to handle preprocessed negative examples evolves during training, we propose dynamic hard negative mining method to expose the model to more challenging negative examples throughout the training process. Secondly, contrastive learning requires as many negative examples as possible but is limited by GPU memory constraints. Therefore, we use a Cross-GPU balancing Loss to provide more negative examples for embedding training and balance the batch size across multiple tasks. Moreover, we also discovered that the prompt-response pairs from LLMs can be used for embedding training. Our approach effectively enhances the capabilities of embedding models, currently ranking first on the Chinese leaderboard of Massive text embedding benchmark

Related papers

Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining [57.352097333505476]
'Breaking the Batch Barrier' (B3) is a novel batch construction strategy designed to curate high-quality batches for Contrastive Learning (CL)<n>Our approach begins by using a pretrained teacher embedding model to rank all examples in the dataset.<n>A community detection algorithm is then applied to this graph to identify clusters of examples that serve as strong negatives for one another.<n>The clusters are then used to construct batches that are rich in in-batch negatives.
arXiv Detail & Related papers (2025-05-16T14:25:43Z)
ReNeg: Learning Negative Embedding with Reward Guidance [69.81219455975477]
In text-to-image (T2I) generation applications, negative embeddings have proven to be a simple yet effective approach for enhancing generation quality. We introduce ReNeg, an end-to-end method designed to learn improved Negative embeddings guided by a Reward model.
arXiv Detail & Related papers (2024-12-27T13:31:55Z)
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining [58.379339799777064]
Large-scale visual language models (VLMs) exhibit strong representation capacities, making them ubiquitous for enhancing image and text understanding tasks. We propose a framework that not only mines in both directions but also generates challenging negative samples in both modalities. Our code and dataset are released at https://ugorsahin.github.io/enhancing-multimodal-compositional-reasoning-of-vlm.html.
arXiv Detail & Related papers (2023-11-07T13:05:47Z)
Fast Propagation is Better: Accelerating Single-Step Adversarial Training via Sampling Subnetworks [69.54774045493227]
A drawback of adversarial training is the computational overhead introduced by the generation of adversarial examples. We propose to exploit the interior building blocks of the model to improve efficiency. Compared with previous methods, our method not only reduces the training cost but also achieves better model robustness.
arXiv Detail & Related papers (2023-10-24T01:36:20Z)
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [70.82725772926949]
Adversarial training is a popular method to robustify models against adversarial attacks. In this work, we investigate this phenomenon from the perspective of training instances. We show that the decay in generalization performance of adversarial training is a result of fitting hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)
When in Doubt, Summon the Titans: Efficient Inference with Large Models [80.2673230098021]
We propose a two-stage framework based on distillation that realizes the modelling benefits of large models. We use the large teacher models to guide the lightweight student models to only make correct predictions on a subset of "easy" examples. Our proposed use of distillation to only handle easy instances allows for a more aggressive trade-off in the student size, thereby reducing the amortized cost of inference.
arXiv Detail & Related papers (2021-10-19T22:56:49Z)
When does loss-based prioritization fail? [18.982933391138268]
We show that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Measures of example difficulty need to correctly separate out noise from other types of challenging examples.
arXiv Detail & Related papers (2021-07-16T07:23:15Z)
Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework. We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z)
Contrastive Learning with Hard Negative Samples [80.12117639845678]
We develop a new family of unsupervised sampling methods for selecting hard negative samples. A limiting case of this sampling results in a representation that tightly clusters each class, and pushes different classes as far apart as possible. The proposed method improves downstream performance across multiple modalities, requires only few additional lines of code to implement, and introduces no computational overhead.
arXiv Detail & Related papers (2020-10-09T14:18:53Z)
SCE: Scalable Network Embedding from Sparsest Cut [20.08464038805681]
Large-scale network embedding is to learn a latent representation for each node in an unsupervised manner. A key of success to such contrastive learning methods is how to draw positive and negative samples. In this paper, we propose SCE for unsupervised network embedding only using negative samples for training.
arXiv Detail & Related papers (2020-06-30T03:18:15Z)
Adaptive Offline Quintuplet Loss for Image-Text Matching [102.50814151323965]
Existing image-text matching approaches typically leverage triplet loss with online hard negatives to train the model. We propose solutions by sampling negatives offline from the whole training set. We evaluate the proposed training approach on three state-of-the-art image-text models on the MS-COCO and Flickr30K datasets.
arXiv Detail & Related papers (2020-03-07T22:09:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.