JTCSE: Joint Tensor-Modulus Constraints and Cross-Attention for Unsupervised Contrastive Learning of Sentence Embeddings
- URL: http://arxiv.org/abs/2505.02366v2
- Date: Wed, 07 May 2025 01:11:50 GMT
- Title: JTCSE: Joint Tensor-Modulus Constraints and Cross-Attention for Unsupervised Contrastive Learning of Sentence Embeddings
- Authors: Tianyu Zong, Hongzhu Yi, Bingkang Shi, Yuanxiang Wang, Jungang Xu,
- Abstract summary: We propose a new textbfJoint textbfTensor representation modulus constraint and textbfCross-attention unsupervised contrastive learning textbfSentence textbfEmbedding representation framework JTCSE.
- Score: 5.152575977825381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised contrastive learning has become a hot research topic in natural language processing. Existing works usually aim at constraining the orientation distribution of the representations of positive and negative samples in the high-dimensional semantic space in contrastive learning, but the semantic representation tensor possesses both modulus and orientation features, and the existing works ignore the modulus feature of the representations and cause insufficient contrastive learning. % Therefore, we firstly propose a training objective that aims at modulus constraints on the semantic representation tensor, to strengthen the alignment between the positive samples in contrastive learning. Therefore, we first propose a training objective that is designed to impose modulus constraints on the semantic representation tensor, to strengthen the alignment between positive samples in contrastive learning. Then, the BERT-like model suffers from the phenomenon of sinking attention, leading to a lack of attention to CLS tokens that aggregate semantic information. In response, we propose a cross-attention structure among the twin-tower ensemble models to enhance the model's attention to CLS token and optimize the quality of CLS Pooling. Combining the above two motivations, we propose a new \textbf{J}oint \textbf{T}ensor representation modulus constraint and \textbf{C}ross-attention unsupervised contrastive learning \textbf{S}entence \textbf{E}mbedding representation framework JTCSE, which we evaluate in seven semantic text similarity computation tasks, and the experimental results show that JTCSE's twin-tower ensemble model and single-tower distillation model outperform the other baselines and become the current SOTA. In addition, we have conducted an extensive zero-shot downstream task evaluation, which shows that JTCSE outperforms other baselines overall on more than 130 tasks.
Related papers
- TNCSE: Tensor's Norm Constraints for Unsupervised Contrastive Learning of Sentence Embeddings [4.62170384991303]
We propose a new Sentence Embedding representation framework, TNCSE.<n>We evaluate seven semantic text similarity tasks, and the results show that TNCSE and derived models are the current state-of-the-art approach.
arXiv Detail & Related papers (2025-03-17T02:14:42Z) - Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation [56.87049651707208]
Few-shot Semantic has evolved into In-context tasks, morphing into a crucial element in assessing generalist segmentation models.
Our initial focus lies in understanding how to facilitate interaction between the query image and the support image, resulting in the proposal of a KV fusion method within the self-attention framework.
Based on our analysis, we establish a simple and effective framework named DiffewS, maximally retaining the original Latent Diffusion Model's generative framework.
arXiv Detail & Related papers (2024-10-03T10:33:49Z) - ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation [24.743048965822297]
This paper introduces a novel semi-supervised LiDAR semantic segmentation framework called ItTakesTwo (IT2)<n>IT2 is designed to ensure consistent predictions from peer LiDAR representations, thereby improving the perturbation effectiveness in consistency learning.<n>Results on public benchmarks show that our approach achieves remarkable improvements over the previous state-of-the-art (SOTA) methods in the field.
arXiv Detail & Related papers (2024-07-09T18:26:53Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Co-guiding for Multi-intent Spoken Language Understanding [53.30511968323911]
We propose a novel model termed Co-guiding Net, which implements a two-stage framework achieving the mutual guidances between the two tasks.
For the first stage, we propose single-task supervised contrastive learning, and for the second stage, we propose co-guiding supervised contrastive learning.
Experiment results on multi-intent SLU show that our model outperforms existing models by a large margin.
arXiv Detail & Related papers (2023-11-22T08:06:22Z) - Identical and Fraternal Twins: Fine-Grained Semantic Contrastive
Learning of Sentence Representations [6.265789210037749]
We introduce a novel Identical and Fraternal Twins of Contrastive Learning framework, capable of simultaneously adapting to various positive pairs generated by different augmentation techniques.
We also present proof-of-concept experiments combined with the contrastive objective to prove the validity of the proposed Twins Loss.
arXiv Detail & Related papers (2023-07-20T15:02:42Z) - Language as a Latent Sequence: deep latent variable models for
semi-supervised paraphrase generation [47.33223015862104]
We present a novel unsupervised model named variational sequence auto-encoding reconstruction (VSAR), which performs latent sequence inference given an observed text.
To leverage information from text pairs, we additionally introduce a novel supervised model we call dual directional learning (DDL), which is designed to integrate with our proposed VSAR model.
Our empirical evaluations suggest that the combined model yields competitive performance against the state-of-the-art supervised baselines on complete data.
arXiv Detail & Related papers (2023-01-05T19:35:30Z) - DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET.
We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images.
We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z) - Semantics-Depth-Symbiosis: Deeply Coupled Semi-Supervised Learning of
Semantics and Depth [83.94528876742096]
We tackle the MTL problem of two dense tasks, ie, semantic segmentation and depth estimation, and present a novel attention module called Cross-Channel Attention Module (CCAM)
In a true symbiotic spirit, we then formulate a novel data augmentation for the semantic segmentation task using predicted depth called AffineMix, and a simple depth augmentation using predicted semantics called ColorAug.
Finally, we validate the performance gain of the proposed method on the Cityscapes dataset, which helps us achieve state-of-the-art results for a semi-supervised joint model based on depth and semantic
arXiv Detail & Related papers (2022-06-21T17:40:55Z) - SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table
Semantic Parsing [19.779493883522072]
This paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.
We propose a schema-aware curriculum learning approach to mitigate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner.
arXiv Detail & Related papers (2021-11-18T02:51:04Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Orthogonal Ensemble Networks for Biomedical Image Segmentation [10.011414604407681]
We introduce Orthogonal Ensemble Networks (OEN), a novel framework to explicitly enforce model diversity.
We benchmark the proposed framework in two challenging brain lesion segmentation tasks.
The experimental results show that our approach produces more robust and well-calibrated ensemble models.
arXiv Detail & Related papers (2021-05-22T23:44:55Z) - Unleashing the Power of Contrastive Self-Supervised Visual Models via
Contrast-Regularized Fine-Tuning [94.35586521144117]
We investigate whether applying contrastive learning to fine-tuning would bring further benefits.
We propose Contrast-regularized tuning (Core-tuning), a novel approach for fine-tuning contrastive self-supervised visual models.
arXiv Detail & Related papers (2021-02-12T16:31:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.