Related papers: Cache & Distil: Optimising API Calls to Large Language Models

Cache & Distil: Optimising API Calls to Large Language Models

URL: http://arxiv.org/abs/2310.13561v1
Date: Fri, 20 Oct 2023 15:01:55 GMT
Title: Cache & Distil: Optimising API Calls to Large Language Models
Authors: Guillem Ram\'irez and Matthias Lindemann and Alexandra Birch and Ivan Titov
Abstract summary: Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student. This student gradually gains proficiency in independently handling an increasing number of user requests.
Score: 82.32065572907125
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large-scale deployment of generative AI tools often depends on costly API calls to a Large Language Model (LLM) to fulfil user queries. To curtail the frequency of these calls, one can employ a smaller language model -- a student -- which is continuously trained on the responses of the LLM. This student gradually gains proficiency in independently handling an increasing number of user requests, a process we term neural caching. The crucial element in neural caching is a policy that decides which requests should be processed by the student alone and which should be redirected to the LLM, subsequently aiding the student's learning. In this study, we focus on classification tasks, and we consider a range of classic active learning-based selection criteria as the policy. Our experiments suggest that Margin Sampling and Query by Committee bring consistent benefits across tasks and budgets.

Related papers

ZeroSearch: Incentivize the Search Capability of LLMs without Searching [69.55482019211597]
We introduce ZeroSearch, a framework that incentivizes the capabilities of large language models to use a real search engine with simulated searches during training.<n>Our approach begins with lightweight supervised fine-tuning to transform the LLM into a retrieval module capable of generating both useful and noisy documents.
arXiv Detail & Related papers (2025-05-07T17:30:22Z)
LLMs can learn self-restraint through iterative self-reflection [57.26854891567574]
Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach. We devise a utility function that can encourage the model to produce responses only when it is confident in them.
arXiv Detail & Related papers (2024-05-15T13:35:43Z)
Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks. Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z)
Dissecting Language Models: Machine Unlearning via Selective Pruning [0.7373617024876725]
This paper introduces a machine unlearning method specifically designed for Large Language Models (LLMs) We introduce a selective pruning method for LLMs that removes neurons based on their relative importance on a targeted capability compared to overall network performance. Our findings reveal that both feed-forward and attention neurons in LLMs are specialized; that is, for specific tasks, certain neurons are more crucial than others.
arXiv Detail & Related papers (2024-03-02T17:10:44Z)
Batch Active Learning of Reward Functions from Human Preferences [33.39413552270375]
Preference-based learning enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data. We develop a set of novel algorithms that enable efficient learning of reward functions using as few data samples as possible.
arXiv Detail & Related papers (2024-02-24T08:07:48Z)
Learning to Learn in Interactive Constraint Acquisition [7.741303298648302]
In Constraint Acquisition (CA), the goal is to assist the user by automatically learning the model. In (inter)active CA, this is done by interactively posting queries to the user. We propose to use probabilistic classification models to guide interactive CA to generate more promising queries.
arXiv Detail & Related papers (2023-12-17T19:12:33Z)
LaGR-SEQ: Language-Guided Reinforcement Learning with Sample-Efficient Querying [71.86163159193327]
Large language models (LLMs) have recently demonstrated their impressive ability to provide context-aware responses via text. This ability could potentially be used to predict plausible solutions in sequential decision making tasks pertaining to pattern completion. We introduce LaGR, which uses this predictive ability of LLMs to propose solutions to tasks that have been partially completed by a primary reinforcement learning (RL) agent.
arXiv Detail & Related papers (2023-08-21T02:07:35Z)
OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs. Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z)
Active metric learning and classification using similarity queries [21.589707834542338]
We show that a novel unified query framework can be applied to any problem in which a key component is learning a representation of the data that reflects similarity. We demonstrate the effectiveness of the proposed strategy on two tasks -- active metric learning and active classification.
arXiv Detail & Related papers (2022-02-04T03:34:29Z)
Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model. As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly. We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.