Related papers: 2-Tier SimCSE: Elevating BERT for Robust Sentence Embeddings

2-Tier SimCSE: Elevating BERT for Robust Sentence Embeddings

URL: http://arxiv.org/abs/2501.13758v1
Date: Thu, 23 Jan 2025 15:36:35 GMT
Title: 2-Tier SimCSE: Elevating BERT for Robust Sentence Embeddings
Authors: Yumeng Wang, Ziran Zhou, Junjin Wang,
Abstract summary: We apply SimCSE (Simple Contrastive Learning of Sentence Embeddings) to fine-tune the minBERT model for sentiment analysis, semantic textual similarity (STS), and paraphrase detection.<n>Our contributions include experimenting with three different dropout techniques, namely standard dropout, curriculum dropout, and adaptive dropout, to tackle overfitting.<n>Our findings demonstrate the effectiveness of SimCSE, with the 2-Tier model achieving superior performance on the STS task, with an average test score of 0.742.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective sentence embeddings that capture semantic nuances and generalize well across diverse contexts are crucial for natural language processing tasks. We address this challenge by applying SimCSE (Simple Contrastive Learning of Sentence Embeddings) using contrastive learning to fine-tune the minBERT model for sentiment analysis, semantic textual similarity (STS), and paraphrase detection. Our contributions include experimenting with three different dropout techniques, namely standard dropout, curriculum dropout, and adaptive dropout, to tackle overfitting, proposing a novel 2-Tier SimCSE Fine-tuning Model that combines both unsupervised and supervised SimCSE on STS task, and exploring transfer learning potential for Paraphrase and SST tasks. Our findings demonstrate the effectiveness of SimCSE, with the 2-Tier model achieving superior performance on the STS task, with an average test score of 0.742 across all three downstream tasks. The results of error analysis reveals challenges in handling complex sentiments and reliance on lexical overlap for paraphrase detection, highlighting areas for future research. The ablation study revealed that removing Adaptive Dropout in the Single-Task Unsupervised SimCSE Model led to improved performance on the STS task, indicating overfitting due to added parameters. Transfer learning from SimCSE models on Paraphrase and SST tasks did not enhance performance, suggesting limited transferability of knowledge from the STS task.

Related papers

Model Predictive Task Sampling for Efficient and Robust Adaptation [46.92143725900031]
We introduce Model Predictive Task Sampling (MPTS), a framework that bridges the task space and adaptation risk landscape. MPTS employs a generative model to characterize the episodic optimization process and predicts task-specific adaptation risk via posterior inference. MPTS seamlessly integrates into zero-shot, few-shot, and supervised finetuning settings.
arXiv Detail & Related papers (2025-01-19T13:14:53Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities. TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models. Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z)
Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives [54.14429346914995]
Chain-of-Thought (CoT) has become a pivotal method for solving complex problems. Large language models (LLMs) often struggle to accurately decompose domain-specific tasks. This paper introduces the Re-TASK framework, a novel theoretical model that revisits LLM tasks from the perspectives of capability, skill, and knowledge.
arXiv Detail & Related papers (2024-08-13T13:58:23Z)
ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision. This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline. Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z)
Advancing Semantic Textual Similarity Modeling: A Regression Framework with Translated ReLU and Smooth K2 Loss [3.435381469869212]
This paper presents an innovative regression framework for Sentence-BERT STS tasks. It proposes two simple yet effective loss functions: Translated ReLU and Smooth K2 Loss. Experimental results demonstrate that our method achieves convincing performance across seven established STS benchmarks.
arXiv Detail & Related papers (2024-06-08T02:52:43Z)
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset [50.36095192314595]
Large Language Models (LLMs) function as conscious agents with generalizable reasoning capabilities. This ability remains underexplored due to the complexity of modeling infinite possible changes in an event. We introduce the first-ever benchmark, MARS, comprising three tasks corresponding to each step.
arXiv Detail & Related papers (2024-06-04T08:35:04Z)
Towards Better Understanding of Contrastive Sentence Representation Learning: A Unified Paradigm for Gradient [20.37803751979975]
Sentence Representation Learning (SRL) is a crucial task in Natural Language Processing (NLP) Many studies have investigated the similarities between contrastive and non-contrastive Self-Supervised Learning (SSL) But in ranking tasks (i.e., Semantic Textual Similarity (STS) in SRL), contrastive SSL significantly outperforms non-contrastive SSL.
arXiv Detail & Related papers (2024-02-28T12:17:40Z)
Sparse Contrastive Learning of Sentence Embeddings [10.251604958122506]
SimCSE has shown the feasibility of contrastive learning in training sentence embeddings. Prior studies have shown that dense models could contain harmful parameters that affect the model performance. We propose parameter sparsification, where alignment and uniformity scores are used to measure the contribution of each parameter to the overall quality of sentence embeddings.
arXiv Detail & Related papers (2023-11-07T10:54:45Z)
Rethinking and Improving Multi-task Learning for End-to-end Speech Translation [51.713683037303035]
We investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. We propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation.
arXiv Detail & Related papers (2023-11-07T08:48:46Z)
Improving Contrastive Learning of Sentence Embeddings with Focal-InfoNCE [13.494159547236425]
This study introduces an unsupervised contrastive learning framework that combines SimCSE with hard negative mining. The proposed focal-InfoNCE function introduces self-paced modulation terms in the contrastive objective, downweighting the loss associated with easy negatives and encouraging the model focusing on hard negatives.
arXiv Detail & Related papers (2023-10-10T18:15:24Z)
TRACE: A Comprehensive Benchmark for Continual Learning in Large Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety. Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs. We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.