Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models
- URL: http://arxiv.org/abs/2310.07611v2
- Date: Sun, 22 Oct 2023 00:37:06 GMT
- Title: Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models
- Authors: Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Zhenhailong Wang,
Heng Ji
- Abstract summary: SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
- Score: 53.859446823312126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dominance of proprietary LLMs has led to restricted access and raised
information privacy concerns. High-performing open-source alternatives are
crucial for information-sensitive and high-volume applications but often lag
behind in performance. To address this gap, we propose (1) A untargeted variant
of iterative self-critique and self-refinement devoid of external influence.
(2) A novel ranking metric - Performance, Refinement, and Inference Cost Score
(PeRFICS) - to find the optimal model for a given task considering refined
performance and cost. Our experiments show that SoTA open source models of
varying sizes from 7B - 65B, on average, improve 8.2% from their baseline
performance. Strikingly, even models with extremely small memory footprints,
such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39%
improvement in high-creativity, open ended tasks on the Vicuna benchmark.
Vicuna-13B takes it a step further and outperforms ChatGPT post-refinement.
This work has profound implications for resource-constrained and
information-sensitive environments seeking to leverage LLMs without incurring
prohibitive costs, compromising on performance and privacy. The domain-agnostic
self-refinement process coupled with our novel ranking metric facilitates
informed decision-making in model selection, thereby reducing costs and
democratizing access to high-performing language models, as evidenced by case
studies.
Related papers
- CERET: Cost-Effective Extrinsic Refinement for Text Generation [14.43795791836198]
We propose CERET, a method for refining text generations by considering semantic stability, entailment and inter-sample uncertainty measures.
Experimental results show that CERET outperforms Self-consistency and Self-rerank baselines consistently under various task setups.
arXiv Detail & Related papers (2024-06-08T22:17:52Z) - Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings [11.878413021518194]
We see for-profit closed-weight models as incompatible with requirements for transparency, privacy, adaptability, and standards of evidence.
We assess the feasibility of using smaller, open-weight models to replace GPT-4-Turbo in zero-shot, few-shot, and fine-tuned regimes.
We find that with relatively low effort, very low absolute monetary cost, and relatively little data for fine-tuning, small open-weight models can achieve competitive performance.
arXiv Detail & Related papers (2024-05-27T04:38:10Z) - OptLLM: Optimal Assignment of Queries to Large Language Models [12.07164196530872]
We propose a framework for addressing the cost-effective query allocation problem for large language models (LLMs)
Our framework, named OptLLM, provides users with a range of optimal solutions to choose from, aligning with their budget constraints and performance preferences.
To evaluate the effectiveness of OptLLM, we conduct extensive experiments on various types of tasks, including text classification, question answering, sentiment analysis, reasoning, and log parsing.
arXiv Detail & Related papers (2024-05-24T01:05:37Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - ROPO: Robust Preference Optimization for Large Language Models [59.10763211091664]
We propose an iterative alignment approach that integrates noise-tolerance and filtering of noisy samples without the aid of external models.
Experiments on three widely-used datasets with Mistral-7B and Llama-2-7B demonstrate that ROPO significantly outperforms existing preference alignment methods.
arXiv Detail & Related papers (2024-04-05T13:58:51Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - Compresso: Structured Pruning with Collaborative Prompting Learns
Compact Large Language Models [15.471290825100075]
We introduce a new paradigm for structurally pruning Large Language Models, called Compresso.
Our approach, through the collaboration of the proposed resource-efficient pruning algorithm and the LLM itself, learns optimal pruning decisions during the training process.
In experiments, Compresso significantly outperforms one-shot pruning baselines across various sparsity ratios, achieving up to 2.21%, 11.43%, 7.04%, and 4.81% higher scores on the commonsense reasoning, reading comprehension, MMLU, and BBH benchmarks, respectively.
arXiv Detail & Related papers (2023-10-08T05:16:28Z) - Tool-Augmented Reward Modeling [58.381678612409]
We propose a tool-augmented preference modeling approach, named Themis, to address limitations by empowering RMs with access to external environments.
Our study delves into the integration of external tools into RMs, enabling them to interact with diverse external sources.
In human evaluations, RLHF trained with Themis attains an average win rate of 32% when compared to baselines.
arXiv Detail & Related papers (2023-10-02T09:47:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.