Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
- URL: http://arxiv.org/abs/2406.12295v2
- Date: Wed, 23 Oct 2024 15:23:00 GMT
- Title: Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
- Authors: Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou,
- Abstract summary: Collaborative decoding between large and small language models (SLMs) presents a promising strategy to mitigate these issues.
Inspired by dual-process cognitive theory, we propose a unified framework, termed Fast and Slow Generating (FS-GEN)
Within this framework, LLMs are categorized as System 2 (slow and deliberate), while independent SLMs are designated as System 1.
- Score: 27.004817441034795
- License:
- Abstract: Large Language Models (LLMs) exhibit impressive capabilities across various applications but encounter substantial challenges such as high inference latency, considerable training costs, and the generation of hallucinations. Collaborative decoding between large and small language models (SLMs) presents a promising strategy to mitigate these issues through methods including speculative decoding, contrastive decoding, and emulator or proxy fine-tuning. However, the specifics of such collaborations, particularly from a unified perspective, remain largely unexplored. Inspired by dual-process cognitive theory, we propose a unified framework in this paper, termed Fast and Slow Generating (FS-GEN). Within this framework, LLMs (sometimes along with SLMs) are categorized as System 2 (slow and deliberate), while independent SLMs are designated as System 1 (fast and intuitive). We provide a comprehensive analysis of these collaborative methodologies, elucidating their common properties and shedding light on the differential knowledge capabilities of System 2 versus System 1 through the FS-GEN framework. Our findings indicate that only a small proportion of collaborative interactions (approximately less than 20\% in most instances) are necessary across various methods. These interactions between System 1 and System 2 conform to a scaling law related to the parameter ratios, enabling predictable collaboration. Furthermore, we explore the specific conditions under which collaboration proves most effective, particularly from an uncertainty perspective, offering novel insights that may guide future optimization efforts. Our research underscores that the fundamental distinction between System 1 and System 2 lies in the uncertainty of next token predictions, where interventions by System 2 are crucial to support System 1. Code for Reproduction: https://github.com/TsinghuaC3I/FS-GEN
Related papers
- Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation [58.799397354312596]
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, particularly in system 1 tasks.
Recent research on System2-to-System1 methods surge, exploring the System 2 reasoning knowledge via inference-time computation.
In this paper, we focus on code generation, which is a representative System 2 task, and identify two primary challenges.
arXiv Detail & Related papers (2025-02-18T03:20:50Z) - Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models [0.8879149917735942]
This paper introduces and explains the concepts of linear contexts (a single, continuous sequence of interactions) and non-linear contexts (branching or multi-path) in Large Language Models (LLMs)
These concepts enable the development of an agent-centric projection of prompting techniques, a framework that can reveal deep connections between prompting strategies and multi-agent systems.
arXiv Detail & Related papers (2025-01-14T03:26:43Z) - Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed.
In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness.
We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z) - Interactive Continual Learning: Fast and Slow Thinking [19.253164551254734]
This paper presents a novel Interactive Continual Learning framework, enabled by collaborative interactions among models of various sizes.
To improve memory retrieval in System1, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution.
Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods.
arXiv Detail & Related papers (2024-03-05T03:37:28Z) - Continual Learning, Fast and Slow [75.53144246169346]
According to the Complementary Learning Systems theory, humans do effective emphcontinual learning through two complementary systems.
We propose emphDualNets (for Dual Networks), a general continual learning framework comprising a fast learning system for supervised learning of specific tasks and a slow learning system for representation learning of task-agnostic general representation via Self-Supervised Learning (SSL)
We demonstrate the promising results of DualNets on a wide range of continual learning protocols, ranging from the standard offline, task-aware setting to the challenging online, task-free scenario.
arXiv Detail & Related papers (2022-09-06T10:48:45Z) - Learning Physical Concepts in Cyber-Physical Systems: A Case Study [72.74318982275052]
We provide an overview of the current state of research regarding methods for learning physical concepts in time series data.
We also analyze the most important methods from the current state of the art using the example of a three-tank system.
arXiv Detail & Related papers (2021-11-28T14:24:52Z) - DualNet: Continual Learning, Fast and Slow [14.902239050081032]
We propose a novel continual learning framework named "DualNet"
It comprises a fast learning system for supervised learning of pattern-separated representation from specific tasks and a slow learning system for unsupervised representation learning of task-agnostic general representation via a Self-Supervised Learning (SSL) technique.
Our experiments show that DualNet outperforms state-of-the-art continual learning methods by a large margin.
arXiv Detail & Related papers (2021-10-01T02:31:59Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.