Related papers: Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data

Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data

URL: http://arxiv.org/abs/2410.07737v1
Date: Thu, 10 Oct 2024 09:15:14 GMT
Title: Plug-and-Play Performance Estimation for LLM Services without Relying on Labeled Data
Authors: Can Wang, Dianbo Sui, Hongliang Sun, Hao Ding, Bolin Zhang, Zhiying Tu,
Abstract summary: Large Language Model (LLM) services exhibit impressive capability on unlearned tasks leveraging only a few examples by in-context learning (ICL) This paper introduces a novel method to estimate the performance of LLM services across different tasks and contexts.
Score: 8.360964737763657
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Model (LLM) services exhibit impressive capability on unlearned tasks leveraging only a few examples by in-context learning (ICL). However, the success of ICL varies depending on the task and context, leading to heterogeneous service quality. Directly estimating the performance of LLM services at each invocation can be laborious, especially requiring abundant labeled data or internal information within the LLM. This paper introduces a novel method to estimate the performance of LLM services across different tasks and contexts, which can be "plug-and-play" utilizing only a few unlabeled samples like ICL. Our findings suggest that the negative log-likelihood and perplexity derived from LLM service invocation can function as effective and significant features. Based on these features, we utilize four distinct meta-models to estimate the performance of LLM services. Our proposed method is compared against unlabeled estimation baselines across multiple LLM services and tasks. And it is experimentally applied to two scenarios, demonstrating its effectiveness in the selection and further optimization of LLM services.

Related papers

From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning [55.90498988440303]
In-context learning (ICL) enables large language models to perform novel tasks without parameter updates.<n>We propose a cost-efficient two-stage pipeline that reduces reliance on language models for data labeling.<n> Experiments across five tasks demonstrate that our method achieves strong performance while lowering labeling costs.
arXiv Detail & Related papers (2025-10-28T15:37:51Z)
LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation [110.610512800947]
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external knowledge.<n>Existing studies often treat utility as a generic attribute, ignoring the fact that different LLMs may benefit differently from the same passage.
arXiv Detail & Related papers (2025-10-13T12:57:45Z)
Just Put a Human in the Loop? Investigating LLM-Assisted Annotation for Subjective Tasks [18.695435335031355]
In subjective annotation tasks with multiple plausible answers, reviewing LLM outputs can change the label distribution.<n>We conducted a pre-registered experiment with 410 unique annotators and over 7,000 annotations.<n>We find that presenting crowdworkers with LLM-generated annotation suggestions did not make them faster, but did improve their self-reported confidence in the task.
arXiv Detail & Related papers (2025-07-21T17:29:21Z)
LLMs as Data Annotators: How Close Are We to Human Performance [47.61698665650761]
Manual annotation of data is labor-intensive, time-consuming, and costly. In-context learning (ICL) in which some examples related to the task are given in the prompt can lead to inefficiencies and suboptimal model performance. This paper presents experiments comparing several LLMs, considering different embedding models, across various datasets for the Named Entity Recognition (NER) task.
arXiv Detail & Related papers (2025-04-21T11:11:07Z)
Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision [50.45597801390757]
Instruct-LF is a goal-oriented latent factor discovery system. It integrates instruction-following ability with statistical models to handle noisy datasets.
arXiv Detail & Related papers (2025-02-21T02:03:08Z)
A Comprehensive Evaluation of Large Language Models on Aspect-Based Sentiment Analysis [26.505386645322506]
Large Language Models (LLMs) have garnered increasing attention in the field of natural language processing. In this paper, we shed light on a comprehensive evaluation of LLMs in the ABSA field, involving 13 datasets, 8 ABSA subtasks, and 6 LLMs. Our experiments demonstrate that LLMs achieve a new state-of-the-art performance compared to fine-tuned Small Language Models (SLMs) in the fine-tuning-dependent paradigm.
arXiv Detail & Related papers (2024-12-03T08:54:17Z)
Revisiting SLO and Goodput Metrics in LLM Serving [17.777554083636716]
Service level objectives (SLOs) and goodput-the number of requests that meet SLOs per second-are introduced to evaluate the performance of LLM serving. Existing metrics fail to capture the nature of user experience. We propose a unified metric framework smooth goodput including SLOs and goodput to reflect the nature of user experience.
arXiv Detail & Related papers (2024-10-18T08:05:37Z)
In-Context Learning with Reinforcement Learning for Incomplete Utterance Rewriting [33.89176174108559]
In-context learning of large language models (LLMs) makes predictions only based on instructions augmented with a few examples. Existing example selection methods for ICL utilize sparse or dense retrievers and derive effective performance. We propose our policy-based reinforcement learning framework for example selection (RLS), which consists of a language model (LM) selector and an LLM generator.
arXiv Detail & Related papers (2024-08-23T12:32:12Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting. We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z)
RepEval: Effective Text Evaluation with LLM Representation [55.26340302485898]
RepEval is a metric that leverages the projection of Large Language Models (LLMs) representations for evaluation. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
arXiv Detail & Related papers (2024-04-30T13:50:55Z)
Automated Commit Message Generation with Large Language Models: An Empirical Study and Beyond [24.151927600694066]
Commit Message Generation (CMG) approaches aim to automatically generate commit messages based on given code diffs. This paper conducts the first comprehensive experiment to investigate how far we have been in applying Large Language Models (LLMs) to generate high-quality commit messages.
arXiv Detail & Related papers (2024-04-23T08:24:43Z)
A Survey on Effective Invocation Methods of Massive LLM Services [9.21599372326452]
Language models as a service (LM) enable users to accomplish tasks without requiring specialized knowledge, simply by paying a service provider. Various providers offer massive large language model (LLM) services with variations in latency, performance, and pricing. This paper provides a comprehensive overview of the LLM services invocation methods.
arXiv Detail & Related papers (2024-02-05T15:10:42Z)
Small LLMs Are Weak Tool Learners: A Multi-LLM Agent [73.54562551341454]
Large Language Model (LLM) agents significantly extend the capabilities of standalone LLMs. We propose a novel approach that decomposes the aforementioned capabilities into a planner, caller, and summarizer. This modular framework facilitates individual updates and the potential use of smaller LLMs for building each capability.
arXiv Detail & Related papers (2024-01-14T16:17:07Z)
Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning [23.932500424117244]
In-Context Learning (ICL) is an emergent capability of Large Language Models (LLMs) Previous studies have shown that using LLMs' outputs as labels is effective in training models to select demonstrations. This paper presents an analysis on different utility functions by focusing on LLMs' output probability given ground-truth output.
arXiv Detail & Related papers (2023-11-16T07:03:54Z)
FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution. We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios. We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning. We propose a simple but effective in-context learning framework called ICL-D3IE. Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.