Related papers: In-context Learning and Gradient Descent Revisited

Related papers

A Comedy of Estimators: On KL Regularization in RL Training of LLMs [81.7906270099878]
reinforcement learning (RL) can substantially improve the reasoning performance of large language models (LLMs)<n>The RL objective for LLM training involves a regularization term, which is the reverse Kullback-Leibler (KL) divergence between the trained policy and the reference policy.<n>Recent works show that prevailing practices for incorporating KL regularization do not provide correct gradients for stated objectives, creating a discrepancy between the objective and its implementation.<n>We study the gradients of several estimators configurations, revealing how design choices shape gradient bias.
arXiv Detail & Related papers (2025-12-26T04:20:58Z)
Concept Regions Matter: Benchmarking CLIP with a New Cluster-Importance Approach [20.898059440239603]
Cluster-based Concept Importance (CCI) is a novel interpretability method.<n>CCI sets a new state of the art on faithfulness benchmarks.<n>We present a comprehensive evaluation of eighteen CLIP variants.
arXiv Detail & Related papers (2025-11-17T05:01:24Z)
Exploring Structural Degradation in Dense Representations for Self-supervised Learning [84.52554180480037]
We observe a counterintuitive phenomenon in self-supervised learning (SSL): longer training may impair the performance of dense prediction tasks.<n>We refer to this phenomenon as Self-supervised Dense Degradation (SDD) and demonstrate its consistent presence across sixteen state-of-the-art SSL methods.<n>We introduce a Dense representation Structure Estimator (DSE) composed of a class-relevance measure and an effective dimensionality measure.
arXiv Detail & Related papers (2025-10-20T08:40:16Z)
CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs [53.749193998004166]
Curriculum learning plays a crucial role in enhancing the training efficiency of large language models.<n>We propose CurES, an efficient training method that accelerates convergence and employs Bayesian posterior estimation to minimize computational overhead.
arXiv Detail & Related papers (2025-10-01T15:41:27Z)
Aligned Contrastive Loss for Long-Tailed Recognition [43.33186901322387]
We propose an Aligned Contrastive Learning (ACL) algorithm to address the long-tailed recognition problem.<n>Our findings indicate that while multi-view training boosts the performance, contrastive learning does not consistently enhance model generalization as the number of views increases.<n>Our ACL algorithm is designed to eliminate these problems and demonstrates strong performance across multiple benchmarks.
arXiv Detail & Related papers (2025-06-01T16:19:30Z)
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning [48.67380502157004]
Large-scale Transformer language models (LMs) trained solely on next-token prediction with web-scale data can solve a wide range of tasks.<n>The mechanism behind this capability, known as in-context learning (ICL), remains both controversial and poorly understood.
arXiv Detail & Related papers (2025-05-16T08:50:42Z)
Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models [0.0]
In-context learning (ICL) has transformed the use of large language models (LLMs) for NLP tasks. Despite its effectiveness, ICL is prone to errors, especially for challenging examples. We propose corrective in-context learning (CICL), an approach that incorporates a model's incorrect predictions alongside ground truth corrections into the prompt.
arXiv Detail & Related papers (2025-03-20T10:39:39Z)
Scaling Sparse and Dense Retrieval in Decoder-Only LLMs [20.173669986209024]
Scaling large language models (LLMs) has shown great potential for improving retrieval model performance. Previous studies have mainly focused on dense retrieval trained with contrastive loss (CL) Sparse retrieval models consistently outperform dense retrieval across both in-domain (MSMARCO, TREC DL) and out-of-domain (BEIR) benchmarks.
arXiv Detail & Related papers (2025-02-21T15:28:26Z)
Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context [13.796664304274643]
We introduce a new framework for quantifying optimality of ICL as a learning algorithm in stylized settings. Our findings reveal a striking dichotomy: while ICL initially matches the efficiency of a Bayes optimal estimator, its efficiency significantly deteriorates in long context. These results clarify the trade-offs in adopting ICL as a universal problem solver, motivating a new generation of on-the-fly adaptive methods.
arXiv Detail & Related papers (2025-02-07T00:26:45Z)
Graph Structure Refinement with Energy-based Contrastive Learning [56.957793274727514]
We introduce an unsupervised method based on a joint of generative training and discriminative training to learn graph structure and representation. We propose an Energy-based Contrastive Learning (ECL) guided Graph Structure Refinement (GSR) framework, denoted as ECL-GSR. ECL-GSR achieves faster training with fewer samples and memories against the leading baseline, highlighting its simplicity and efficiency in downstream tasks.
arXiv Detail & Related papers (2024-12-20T04:05:09Z)
Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning [22.341935761925892]
Fine-tuning and in-context learning (ICL) are two prevalent methods in imbuing large language models with task-specific knowledge. This paper presents a counterintuitive finding: For tasks with implicit patterns, ICL captures these patterns significantly better than fine-tuning.
arXiv Detail & Related papers (2024-10-07T02:12:22Z)
L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering [33.165094795515785]
Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering. We propose L2CL, a principled Layer-to-Layer Contrastive Learning framework that contrasts representations from different layers. We find that L2CL, using only one-hop contrastive learning paradigm, is able to capture intrinsic semantic structures and improve the quality of node representation.
arXiv Detail & Related papers (2024-07-19T12:45:21Z)
Surgical Feature-Space Decomposition of LLMs: Why, When and How? [8.826164604720738]
We empirically study the efficacy of weight and feature space decomposition in transformer-based language models. We show that surgical decomposition provides critical insights into the trade-off between compression and language modelling performance. We extend our investigation to the implications of low-rank approximations on model bias.
arXiv Detail & Related papers (2024-05-17T07:34:03Z)
On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs) However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z)
Gradient constrained sharpness-aware prompt learning for vision-language models [99.74832984957025]
This paper targets a novel trade-off problem in generalizable prompt learning for vision-language models (VLM) By analyzing the loss landscapes of the state-of-the-art method and vanilla Sharpness-aware Minimization (SAM) based method, we conclude that the trade-off performance correlates to both loss value and loss sharpness. We propose a novel SAM-based method for prompt learning, denoted as Gradient Constrained Sharpness-aware Context Optimization (GCSCoOp)
arXiv Detail & Related papers (2023-09-14T17:13:54Z)
Learning Deep Representations via Contrastive Learning for Instance Retrieval [11.736450745549792]
This paper makes the first attempt that tackles the problem using instance-discrimination based contrastive learning (CL) In this work, we approach this problem by exploring the capability of deriving discriminative representations from pre-trained and fine-tuned CL models.
arXiv Detail & Related papers (2022-09-28T04:36:34Z)
Zero-Shot Temporal Action Detection via Vision-Language Prompting [134.26292288193298]
We propose a novel zero-Shot Temporal Action detection model via Vision-LanguagE prompting (STALE) Our model significantly outperforms state-of-the-art alternatives. Our model also yields superior results on supervised TAD over recent strong competitors.
arXiv Detail & Related papers (2022-07-17T13:59:46Z)
Interventional Contrastive Learning with Meta Semantic Regularizer [28.708395209321846]
Contrastive learning (CL)-based self-supervised learning models learn visual representations in a pairwise manner. When the CL model is trained with full images, the performance tested in full images is better than that in foreground areas. When the CL model is trained with foreground areas, the performance tested in full images is worse than that in foreground areas.
arXiv Detail & Related papers (2022-06-29T15:02:38Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
Toward Fast, Flexible, and Robust Low-Light Image Enhancement [87.27326390675155]
We develop a new Self-Calibrated Illumination (SCI) learning framework for fast, flexible, and robust brightening images in real-world low-light scenarios. Considering the computational burden of the cascaded pattern, we construct the self-calibrated module which realizes the convergence between results of each stage. We make comprehensive explorations to SCI's inherent properties including operation-insensitive adaptability and model-irrelevant generality.
arXiv Detail & Related papers (2022-04-21T14:40:32Z)
Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.