Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models
- URL: http://arxiv.org/abs/2310.07611v2
- Date: Sun, 22 Oct 2023 00:37:06 GMT
- Title: Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models
- Authors: Sumuk Shashidhar, Abhinav Chinta, Vaibhav Sahai, Zhenhailong Wang,
Heng Ji
- Abstract summary: SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
- Score: 53.859446823312126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The dominance of proprietary LLMs has led to restricted access and raised
information privacy concerns. High-performing open-source alternatives are
crucial for information-sensitive and high-volume applications but often lag
behind in performance. To address this gap, we propose (1) A untargeted variant
of iterative self-critique and self-refinement devoid of external influence.
(2) A novel ranking metric - Performance, Refinement, and Inference Cost Score
(PeRFICS) - to find the optimal model for a given task considering refined
performance and cost. Our experiments show that SoTA open source models of
varying sizes from 7B - 65B, on average, improve 8.2% from their baseline
performance. Strikingly, even models with extremely small memory footprints,
such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39%
improvement in high-creativity, open ended tasks on the Vicuna benchmark.
Vicuna-13B takes it a step further and outperforms ChatGPT post-refinement.
This work has profound implications for resource-constrained and
information-sensitive environments seeking to leverage LLMs without incurring
prohibitive costs, compromising on performance and privacy. The domain-agnostic
self-refinement process coupled with our novel ranking metric facilitates
informed decision-making in model selection, thereby reducing costs and
democratizing access to high-performing language models, as evidenced by case
studies.
Related papers
- Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.
In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.
DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z) - Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.
One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.
We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z) - Adaptive Client Selection in Federated Learning: A Network Anomaly Detection Use Case [0.30723404270319693]
This paper introduces a client selection framework for Federated Learning (FL) that incorporates differential privacy and fault tolerance.
Results demonstrate up to a 7% improvement in accuracy and a 25% reduction in training time compared to the FedL2P approach.
arXiv Detail & Related papers (2025-01-25T02:50:46Z) - Adaptive Pruning for Large Language Models with Structural Importance Awareness [66.2690963378878]
Large language models (LLMs) have significantly improved language understanding and generation capabilities.
LLMs are difficult to deploy on resource-constrained edge devices due to their high computational and storage resource demands.
We propose structurally-aware adaptive pruning (SAAP) to significantly reduce the computational and memory costs while maintaining model performance.
arXiv Detail & Related papers (2024-12-19T18:08:04Z) - EACO: Enhancing Alignment in Multimodal LLMs via Critical Observation [58.546205554954454]
We propose Enhancing Alignment in MLLMs via Critical Observation (EACO)
EACO aligns MLLMs by self-generated preference data using only 5k images economically.
EACO reduces the overall hallucinations by 65.6% on HallusionBench and improves the reasoning ability by 21.8% on MME-Cognition.
arXiv Detail & Related papers (2024-12-06T09:59:47Z) - CERET: Cost-Effective Extrinsic Refinement for Text Generation [14.43795791836198]
We propose CERET, a method for refining text generations by considering semantic stability, entailment and inter-sample uncertainty measures.
Experimental results show that CERET outperforms Self-consistency and Self-rerank baselines consistently under various task setups.
arXiv Detail & Related papers (2024-06-08T22:17:52Z) - InfoRM: Mitigating Reward Hacking in RLHF via Information-Theoretic Reward Modeling [66.3072381478251]
Reward hacking, also termed reward overoptimization, remains a critical challenge.
We propose a framework for reward modeling, namely InfoRM, by introducing a variational information bottleneck objective.
We show that InfoRM's overoptimization detection mechanism is not only effective but also robust across a broad range of datasets.
arXiv Detail & Related papers (2024-02-14T17:49:07Z) - On Leveraging Large Language Models for Enhancing Entity Resolution: A Cost-efficient Approach [7.996010840316654]
We propose an uncertainty reduction framework using Large Language Models (LLMs) to improve entity resolution results.
LLMs capitalize on their advanced linguistic capabilities and a pay-as-you-go'' model that provides significant advantages to those without extensive data science expertise.
We show that our method is efficient and effective, offering promising applications in real-world tasks.
arXiv Detail & Related papers (2024-01-07T09:06:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.