Narrowing the Gap between Supervised and Unsupervised Sentence
Representation Learning with Large Language Model
- URL: http://arxiv.org/abs/2309.06453v2
- Date: Tue, 19 Dec 2023 12:13:25 GMT
- Title: Narrowing the Gap between Supervised and Unsupervised Sentence
Representation Learning with Large Language Model
- Authors: Mingxin Li, Richong Zhang, Zhijie Nie, Yongyi Mao
- Abstract summary: Sentence Representation Learning (SRL) is a fundamental task in Natural Language Processing (NLP)
Contrastive Learning of Sentence Embeddings (CSE) is the mainstream technique due to its superior performance.
Previous works attribute this performance gap to differences in two representation properties (alignment and uniformity)
- Score: 44.77515147970206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentence Representation Learning (SRL) is a fundamental task in Natural
Language Processing (NLP), with the Contrastive Learning of Sentence Embeddings
(CSE) being the mainstream technique due to its superior performance. An
intriguing phenomenon in CSE is the significant performance gap between
supervised and unsupervised methods, with their only difference lying in the
training data. Previous works attribute this performance gap to differences in
two representation properties (alignment and uniformity). However, since
alignment and uniformity only measure the results, they fail to answer "What
aspects of the training data contribute to the performance gap?" and "How can
the performance gap be narrowed?", In this paper, we conduct empirical
experiments to answer these "What" and "How" questions. We first answer the
"What" question by thoroughly comparing the behavior of supervised and
unsupervised CSE during their respective training processes. From the
comparison, we identify the similarity pattern as a key factor to the
performance gap, and introduce a metric, called Relative Fitting Difficulty
(RFD), to measure the complexity of the similarity pattern. Then, based on the
insights gained from the "What" question, we tackle the "How" question by
increasing the pattern complexity of the training data. We achieve this by
leveraging the In-Context Learning (ICL) capability of the Large Language Model
(LLM) to generate data that simulates complex patterns. By utilizing the
hierarchical patterns in the LLM-generated data, we effectively narrow the gap
between supervised and unsupervised CSE. We release our codes and appendix at
https://github.com/BDBC-KG-NLP/NGCSE.
Related papers
- On the Paradoxical Interference between Instruction-Following and Task Solving [50.75960598434753]
Instruction following aims to align Large Language Models (LLMs) with human intent by specifying explicit constraints on how tasks should be performed.<n>We reveal a counterintuitive phenomenon: instruction following can paradoxically interfere with LLMs' task-solving capability.<n>We propose a metric, SUSTAINSCORE, to quantify the interference of instruction following with task solving.
arXiv Detail & Related papers (2026-01-29T17:48:56Z) - Exploring Structural Degradation in Dense Representations for Self-supervised Learning [84.52554180480037]
We observe a counterintuitive phenomenon in self-supervised learning (SSL): longer training may impair the performance of dense prediction tasks.<n>We refer to this phenomenon as Self-supervised Dense Degradation (SDD) and demonstrate its consistent presence across sixteen state-of-the-art SSL methods.<n>We introduce a Dense representation Structure Estimator (DSE) composed of a class-relevance measure and an effective dimensionality measure.
arXiv Detail & Related papers (2025-10-20T08:40:16Z) - A Psychology-based Unified Dynamic Framework for Curriculum Learning [5.410910735259908]
This paper presents a Psychology-based Unified Dynamic Framework for Curriculum Learning (PUDF)
We quantify the difficulty of training data by applying Item Response Theory (IRT) to responses from Artificial Crowds (AC)
We propose a Dynamic Data Selection via Model Ability Estimation (DDS-MAE) strategy to schedule the appropriate amount of data during model training.
arXiv Detail & Related papers (2024-08-09T20:30:37Z) - Semi-Supervised One-Shot Imitation Learning [83.94646047695412]
One-shot Imitation Learning aims to imbue AI agents with the ability to learn a new task from a single demonstration.
We introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of trajectories.
We develop an algorithm specifically applicable to this semi-supervised OSIL setting.
arXiv Detail & Related papers (2024-08-09T18:11:26Z) - Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning [99.05401042153214]
In-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) and task learning (TL)
We take the first step by examining the pre-training dynamics of the emergence of ICL.
We propose a simple yet effective method to better integrate these two abilities for ICL at inference time.
arXiv Detail & Related papers (2024-06-20T06:37:47Z) - Vocabulary-Defined Semantics: Latent Space Clustering for Improving In-Context Learning [32.178931149612644]
In-context learning enables language models to adapt to downstream data or incorporate tasks by few samples as demonstrations within the prompts.
However, the performance of in-context learning can be unstable depending on the quality, format, or order of demonstrations.
We propose a novel approach "vocabulary-defined semantics"
arXiv Detail & Related papers (2024-01-29T14:29:48Z) - Compositional Exemplars for In-context Learning [21.961094715261133]
Large pretrained language models (LMs) have shown impressive In-Context Learning (ICL) ability.
We propose CEIL (Compositional Exemplars for In-context Learning) to model the interaction between the given input and in-context examples.
We validate CEIL on 12 classification and generation datasets from 7 distinct NLP tasks, including sentiment analysis, paraphrase detection, natural language inference, commonsense reasoning, open-domain question answering, code generation, and semantic parsing.
arXiv Detail & Related papers (2023-02-11T14:02:08Z) - Deep Stable Learning for Out-Of-Distribution Generalization [27.437046504902938]
Approaches based on deep neural networks have achieved striking performance when testing data and training data share similar distribution.
Eliminating the impact of distribution shifts between training and testing data is crucial for building performance-promising deep models.
We propose to address this problem by removing the dependencies between features via learning weights for training samples.
arXiv Detail & Related papers (2021-04-16T03:54:21Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Unsupervised Feature Learning by Cross-Level Instance-Group
Discrimination [68.83098015578874]
We integrate between-instance similarity into contrastive learning, not directly by instance grouping, but by cross-level discrimination.
CLD effectively brings unsupervised learning closer to natural data and real-world applications.
New state-of-the-art on self-supervision, semi-supervision, and transfer learning benchmarks, and beats MoCo v2 and SimCLR on every reported performance.
arXiv Detail & Related papers (2020-08-09T21:13:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.