Sparse Contrastive Learning of Sentence Embeddings
- URL: http://arxiv.org/abs/2311.03881v1
- Date: Tue, 7 Nov 2023 10:54:45 GMT
- Title: Sparse Contrastive Learning of Sentence Embeddings
- Authors: Ruize An, Chen Zhang, Dawei Song
- Abstract summary: SimCSE has shown the feasibility of contrastive learning in training sentence embeddings.
Prior studies have shown that dense models could contain harmful parameters that affect the model performance.
We propose parameter sparsification, where alignment and uniformity scores are used to measure the contribution of each parameter to the overall quality of sentence embeddings.
- Score: 10.251604958122506
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, SimCSE has shown the feasibility of contrastive learning in
training sentence embeddings and illustrates its expressiveness in spanning an
aligned and uniform embedding space. However, prior studies have shown that
dense models could contain harmful parameters that affect the model
performance, and it is no wonder that SimCSE can as well be invented with such
parameters. Driven by this, parameter sparsification is applied, where
alignment and uniformity scores are used to measure the contribution of each
parameter to the overall quality of sentence embeddings. Drawing from a
preliminary study, we consider parameters with minimal contributions to be
detrimental, as their sparsification results in improved model performance. To
discuss the ubiquity of detrimental parameters and remove them, more
experiments on the standard semantic textual similarity (STS) tasks and
transfer learning tasks are conducted, and the results show that the proposed
sparsified SimCSE (SparseCSE) has excellent performance in comparison with
SimCSE. Furthermore, through in-depth analysis, we establish the validity and
stability of our sparsification method, showcasing that the embedding space
generated by SparseCSE exhibits improved alignment compared to that produced by
SimCSE. Importantly, the uniformity yet remains uncompromised.
Related papers
- PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE is a self-supervised learning framework that enhances global feature representation of point cloud mask autoencoders.
We show that PseudoNeg-MAE achieves state-of-the-art performance on the ModelNet40 and ScanObjectNN datasets.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Unveiling the Flaws: Exploring Imperfections in Synthetic Data and Mitigation Strategies for Large Language Models [89.88010750772413]
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs)
Our work delves into these specific flaws associated with question-answer (Q-A) pairs, a prevalent type of synthetic data, and presents a method based on unlearning techniques to mitigate these flaws.
Our work has yielded key insights into the effective use of synthetic data, aiming to promote more robust and efficient LLM training.
arXiv Detail & Related papers (2024-06-18T08:38:59Z) - Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss [3.435381469869212]
This paper presents an innovative regression framework for Sentence-BERT STS tasks.
It proposes two simple yet effective loss functions: Translated ReLU and Smooth K2 Loss.
Experimental results demonstrate that our method achieves convincing performance across seven established STS benchmarks.
arXiv Detail & Related papers (2024-06-08T02:52:43Z) - The Common Stability Mechanism behind most Self-Supervised Learning
Approaches [64.40701218561921]
We provide a framework to explain the stability mechanism of different self-supervised learning techniques.
We discuss the working mechanism of contrastive techniques like SimCLR, non-contrastive techniques like BYOL, SWAV, SimSiam, Barlow Twins, and DINO.
We formulate different hypotheses and test them using the Imagenet100 dataset.
arXiv Detail & Related papers (2024-02-22T20:36:24Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Improving Contrastive Learning of Sentence Embeddings with Focal-InfoNCE [13.494159547236425]
This study introduces an unsupervised contrastive learning framework that combines SimCSE with hard negative mining.
The proposed focal-InfoNCE function introduces self-paced modulation terms in the contrastive objective, downweighting the loss associated with easy negatives and encouraging the model focusing on hard negatives.
arXiv Detail & Related papers (2023-10-10T18:15:24Z) - Dynamically Scaled Temperature in Self-Supervised Contrastive Learning [11.133502139934437]
We focus on improving the performance of InfoNCE loss in self-supervised learning by proposing a novel cosine similarity dependent temperature scaling function.
Experimental evidence shows that the proposed framework outperforms the contrastive loss-based SSL algorithms.
arXiv Detail & Related papers (2023-08-02T13:31:41Z) - Identical and Fraternal Twins: Fine-Grained Semantic Contrastive
Learning of Sentence Representations [6.265789210037749]
We introduce a novel Identical and Fraternal Twins of Contrastive Learning framework, capable of simultaneously adapting to various positive pairs generated by different augmentation techniques.
We also present proof-of-concept experiments combined with the contrastive objective to prove the validity of the proposed Twins Loss.
arXiv Detail & Related papers (2023-07-20T15:02:42Z) - Understanding Collapse in Non-Contrastive Learning [122.2499276246997]
We show that SimSiam representations undergo partial dimensional collapse if the model is too small relative to the dataset size.
We propose a metric to measure the degree of this collapse and show that it can be used to forecast the downstream task performance without any fine-tuning or labels.
arXiv Detail & Related papers (2022-09-29T17:59:55Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.