Related papers: No Free Lunch in Active Learning: LLM Embedding Quality Dictates Query Strategy Success

No Free Lunch in Active Learning: LLM Embedding Quality Dictates Query Strategy Success

URL: http://arxiv.org/abs/2506.01992v1
Date: Sun, 18 May 2025 10:38:26 GMT
Title: No Free Lunch in Active Learning: LLM Embedding Quality Dictates Query Strategy Success
Authors: Lukas Rauch, Moritz Wirth, Denis Huseljic, Marek Herde, Bernhard Sick, Matthias Aßenmacher,
Abstract summary: Large language models (LLMs) capable of producing general-purpose representations lets us revisit the practicality of deep active learning (AL)<n>This study establishes a benchmark and systematically investigates the influence of LLM embedding quality on query strategies in deep AL.
Score: 1.950171084881346
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advent of large language models (LLMs) capable of producing general-purpose representations lets us revisit the practicality of deep active learning (AL): By leveraging frozen LLM embeddings, we can mitigate the computational costs of iteratively fine-tuning large backbones. This study establishes a benchmark and systematically investigates the influence of LLM embedding quality on query strategies in deep AL. We employ five top-performing models from the massive text embedding benchmark (MTEB) leaderboard and two baselines for ten diverse text classification tasks. Our findings reveal key insights: First, initializing the labeled pool using diversity-based sampling synergizes with high-quality embeddings, boosting performance in early AL iterations. Second, the choice of the optimal query strategy is sensitive to embedding quality. While the computationally inexpensive Margin sampling can achieve performance spikes on specific datasets, we find that strategies like Badge exhibit greater robustness across tasks. Importantly, their effectiveness is often enhanced when paired with higher-quality embeddings. Our results emphasize the need for context-specific evaluation of AL strategies, as performance heavily depends on embedding quality and the target task.

Related papers

To Label or Not to Label: PALM -- A Predictive Model for Evaluating Sample Efficiency in Active Learning Models [2.2667044928324747]
Active learning (AL) seeks to reduce annotation costs by selecting the most informative samples for labeling.<n>Traditional evaluation methods, which focus solely on final accuracy, fail to capture the full dynamics of the learning process.<n>We propose PALM, a unified and interpretable mathematical model that characterizes AL trajectories through four key parameters.
arXiv Detail & Related papers (2025-07-21T08:37:44Z)
Layer-Aware Embedding Fusion for LLMs in Text Classifications [1.4250487522292254]
We propose a layer-aware embedding selection method and investigate how to quantitatively evaluate different layers to identify the most important ones for downstream NLP tasks.<n>Experiments on four English text classification datasets demonstrate that different layers in LLMs exhibit varying degrees of representational strength for classification.<n>We also explore how combining embeddings from multiple LLMs, without requiring model fine-tuning, can improve performance.
arXiv Detail & Related papers (2025-04-08T07:45:50Z)
Don't Just Demo, Teach Me the Principles: A Principle-Based Multi-Agent Prompting Strategy for Text Classification [4.811763060654019]
We present PRINCIPLE-BASED PROMPTING, a simple but effective multi-agent prompting strategy for text classification.<n>Our approach achieves substantial performance gains (1.55% - 19.37%) over zero-shot prompting on macro-F1 score.<n>Our multi-agent PRINCIPLE-BASED PROMPTING approach also shows on-par or better performance compared to demonstration-based few-shot prompting approaches.
arXiv Detail & Related papers (2025-02-11T01:10:13Z)
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning [61.99353167168545]
We show that fine-tuning with LLM-generated data improves target task performance and reduces non-target task degradation.<n>This is the first work to provide an empirical explanation based on token perplexity reduction to mitigate catastrophic forgetting in LLMs after fine-tuning.
arXiv Detail & Related papers (2025-01-24T08:18:56Z)
Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.<n>LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.<n>We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z)
Active Prompt Learning with Vision-Language Model Priors [9.173468790066956]
We introduce a class-guided clustering that leverages the pre-trained image and text encoders of vision-language models. We propose a budget-saving selective querying based on adaptive class-wise thresholds.
arXiv Detail & Related papers (2024-11-23T02:34:33Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios. We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples. Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z)
Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs. We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z)
Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks. However, their large size makes their inference slow and computationally expensive. We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.