Related papers: Logistic Regression makes small LLMs strong and explainable "tens-of-shot" classifiers

Related papers

Large Multimodal Models as General In-Context Classifiers [73.11242790834383]
In this work, we argue that this answer overlooks an important capability of LMMs: in-context learning.<n>We benchmark state-of-the-art LMMs on diverse datasets for closed-world classification and find that, although their zero-shot performance is lower than CLIP's, LMMs with a few in-context examples can match or even surpass contrastive VLMs with cache-based adapters.<n>We extend this analysis to the open-world setting, where the generative nature of LMMs makes them more suitable for the task.
arXiv Detail & Related papers (2026-02-26T17:08:18Z)
Are LLMs Ready to Replace Bangla Annotators? [0.5468559068505657]
Large Language Models (LLMs) are increasingly used as automated annotators to scale dataset creation.<n>We study the behavior of LLMs as zero-shot annotators for Bangla hate speech.<n>Our analysis uncovers annotator bias and substantial instability in model judgments.
arXiv Detail & Related papers (2026-02-18T07:36:41Z)
Does Model Size Matter? A Comparison of Small and Large Language Models for Requirements Classification [4.681300232651754]
Large language models (LLMs) show notable results in natural language processing (NLP) tasks for requirements engineering (RE)<n>In contrast, small language models (SLMs) offer a lightweight, locally deployable alternative.
arXiv Detail & Related papers (2025-10-24T13:20:30Z)
Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers [59.168391398830515]
We evaluate 12 pre-trained LLMs and one specialized fact-verifier, using a collection of examples from 14 fact-checking benchmarks.<n>We highlight the importance of addressing annotation errors and ambiguity in datasets.<n> frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance.
arXiv Detail & Related papers (2025-06-16T10:32:10Z)
In a Few Words: Comparing Weak Supervision and LLMs for Short Query Intent Classification [4.037445459586932]
We empirically compare user intent classification into informational, navigational, and transactional categories. Our results indicate that while LLMs outperform weak supervision in recall, they continue to struggle with precision.
arXiv Detail & Related papers (2025-04-30T07:54:04Z)
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets [7.547445287035568]
Generative classification addresses this by prompting the model to directly output labels. We bridge this gap with Gen++, a framework that unifies SFT, RL, and inference-time prompting. Across seven datasets, Gen++ achieves an average accuracy improvement of 3.46% relative to the naive SFT baseline.
arXiv Detail & Related papers (2025-04-28T15:30:58Z)
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies [0.5892638927736115]
Verifiers are auxiliary models that assess the correctness of outputs generated by base large language models (LLMs) In this work, we introduce a novel lightweight verification approach, LiLaVe, which reliably extracts correctness signals from the hidden states of the base LLM. A key advantage of LiLaVe is its ability to operate with only a small fraction of the computational budget required by traditional LLM-based verifiers.
arXiv Detail & Related papers (2025-04-23T14:33:20Z)
Towards Automated Fact-Checking of Real-World Claims: Exploring Task Formulation and Assessment with LLMs [32.45604456988931]
This study establishes baseline comparisons for Automated Fact-Checking (AFC) using Large Language Models (LLMs) We evaluate Llama-3 models of varying sizes on 17,856 claims collected from PolitiFact (2007-2024) using evidence retrieved via restricted web searches. Our results show that larger LLMs consistently outperform smaller LLMs in classification accuracy and justification quality without fine-tuning.
arXiv Detail & Related papers (2025-02-13T02:51:17Z)
Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets. LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student. Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z)
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs [74.35290684163718]
A primary challenge in large language model (LLM) development is their onerous pre-training cost. This paper explores a promising paradigm to improve LLM pre-training efficiency and quality by leveraging a small language model (SLM)
arXiv Detail & Related papers (2024-10-24T14:31:52Z)
Zero-to-Strong Generalization: Eliciting Strong Capabilities of Large Language Models Iteratively without Gold Labels [75.77877889764073]
Large Language Models (LLMs) have demonstrated remarkable performance through supervised fine-tuning or in-context learning using gold labels. This study explores whether solely utilizing unlabeled data can elicit strong model capabilities. We propose a new paradigm termed zero-to-strong generalization.
arXiv Detail & Related papers (2024-09-19T02:59:44Z)
Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation [50.837277466987345]
We focus on the field of large language models (LLMs) for recommendation. We propose RecLoRA, which incorporates a Personalized LoRA module that maintains independent LoRAs for different users. We also design a Few2Many Learning Strategy, using a conventional recommendation model as a lens to magnify small training spaces to full spaces.
arXiv Detail & Related papers (2024-08-07T04:20:28Z)
Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data [3.9459077974367833]
Large language models (LLMs) have demonstrated remarkable success in NLP tasks. We benchmarked one supervised classic machine learning model based on Support Vector Machines (SVMs), three supervised pretrained language models (PLMs) based on RoBERTa, BERTweet, and SocBERT, and two LLM based classifiers (GPT3.5 and GPT4), across 6 text classification tasks. Our comprehensive experiments demonstrate that employ-ing data augmentation using LLMs (GPT-4) with relatively small human-annotated data to train lightweight supervised classification models achieves superior results compared to training with human-annotated data
arXiv Detail & Related papers (2024-03-27T22:05:10Z)
Learning to Reduce: Optimal Representations of Structured Data in Prompting Large Language Models [42.16047343029512]
Large Language Models (LLMs) have been widely used as general-purpose AI agents. We propose a framework, Learning to Reduce, that fine-tunes a language model to generate a reduced version of an input context. We show that our model achieves comparable accuracies in selecting the relevant evidence from an input context.
arXiv Detail & Related papers (2024-02-22T00:41:23Z)
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers [2.1165011830664673]
Large Language Models (LLMs) possess outstanding capabilities in addressing various natural language processing (NLP) tasks. The sheer size of these models poses challenges in terms of storage, training and inference due to the inclusion of billions of parameters through layer stacking. We show that even with fewer layers, LLMs maintain similar or better performance levels, particularly in prompt-based fine-tuning for text classification tasks.
arXiv Detail & Related papers (2024-02-18T20:47:10Z)
Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z)
Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning [23.932500424117244]
In-Context Learning (ICL) is an emergent capability of Large Language Models (LLMs) Previous studies have shown that using LLMs' outputs as labels is effective in training models to select demonstrations. This paper presents an analysis on different utility functions by focusing on LLMs' output probability given ground-truth output.
arXiv Detail & Related papers (2023-11-16T07:03:54Z)
LLMaAA: Making Large Language Models as Active Annotators [32.57011151031332]
We propose LLMaAA, which takes large language models as annotators and puts them into an active learning loop to determine what to annotate efficiently. We conduct experiments and analysis on two classic NLP tasks, named entity recognition and relation extraction. With LLMaAA, task-specific models trained from LLM-generated labels can outperform the teacher within only hundreds of annotated examples.
arXiv Detail & Related papers (2023-10-30T14:54:15Z)
Language models are weak learners [71.33837923104808]
We show that prompt-based large language models can operate effectively as weak learners. We incorporate these models into a boosting approach, which can leverage the knowledge within the model to outperform traditional tree-based boosting. Results illustrate the potential for prompt-based LLMs to function not just as few-shot learners themselves, but as components of larger machine learning pipelines.
arXiv Detail & Related papers (2023-06-25T02:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.