Related papers: Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding

URL: http://arxiv.org/abs/2406.12295v2
Date: Wed, 23 Oct 2024 15:23:00 GMT
Title: Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding
Authors: Kaiyan Zhang, Jianyu Wang, Ning Ding, Biqing Qi, Ermo Hua, Xingtai Lv, Bowen Zhou,
Abstract summary: Collaborative decoding between large and small language models (SLMs) presents a promising strategy to mitigate these issues. Inspired by dual-process cognitive theory, we propose a unified framework, termed Fast and Slow Generating (FS-GEN) Within this framework, LLMs are categorized as System 2 (slow and deliberate), while independent SLMs are designated as System 1.
Score: 27.004817441034795
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) exhibit impressive capabilities across various applications but encounter substantial challenges such as high inference latency, considerable training costs, and the generation of hallucinations. Collaborative decoding between large and small language models (SLMs) presents a promising strategy to mitigate these issues through methods including speculative decoding, contrastive decoding, and emulator or proxy fine-tuning. However, the specifics of such collaborations, particularly from a unified perspective, remain largely unexplored. Inspired by dual-process cognitive theory, we propose a unified framework in this paper, termed Fast and Slow Generating (FS-GEN). Within this framework, LLMs (sometimes along with SLMs) are categorized as System 2 (slow and deliberate), while independent SLMs are designated as System 1 (fast and intuitive). We provide a comprehensive analysis of these collaborative methodologies, elucidating their common properties and shedding light on the differential knowledge capabilities of System 2 versus System 1 through the FS-GEN framework. Our findings indicate that only a small proportion of collaborative interactions (approximately less than 20\% in most instances) are necessary across various methods. These interactions between System 1 and System 2 conform to a scaling law related to the parameter ratios, enabling predictable collaboration. Furthermore, we explore the specific conditions under which collaboration proves most effective, particularly from an uncertainty perspective, offering novel insights that may guide future optimization efforts. Our research underscores that the fundamental distinction between System 1 and System 2 lies in the uncertainty of next token predictions, where interventions by System 2 are crucial to support System 1. Code for Reproduction: https://github.com/TsinghuaC3I/FS-GEN

Related papers

A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making. With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems. We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
From System 1 to System 2: A Survey of Reasoning Large Language Models [72.99519859756602]
Foundational Large Language Models excel at fast decision-making but lack depth for complex reasoning. OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding.
arXiv Detail & Related papers (2025-02-24T18:50:52Z)
Boost, Disentangle, and Customize: A Robust System2-to-System1 Pipeline for Code Generation [58.799397354312596]
Large language models (LLMs) have demonstrated remarkable capabilities in various domains, particularly in system 1 tasks. Recent research on System2-to-System1 methods surge, exploring the System 2 reasoning knowledge via inference-time computation. In this paper, we focus on code generation, which is a representative System 2 task, and identify two primary challenges.
arXiv Detail & Related papers (2025-02-18T03:20:50Z)
Agent-Centric Projection of Prompting Techniques and Implications for Synthetic Training Data for Large Language Models [0.8879149917735942]
This paper introduces and explains the concepts of linear contexts (a single, continuous sequence of interactions) and non-linear contexts (branching or multi-path) in Large Language Models (LLMs) These concepts enable the development of an agent-centric projection of prompting techniques, a framework that can reveal deep connections between prompting strategies and multi-agent systems.
arXiv Detail & Related papers (2025-01-14T03:26:43Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning. The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms. Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z)
Networks of Networks: Complexity Class Principles Applied to Compound AI Systems Design [63.24275274981911]
Compound AI Systems consisting of many language model inference calls are increasingly employed. In this work, we construct systems, which we call Networks of Networks (NoNs) organized around the distinction between generating a proposed answer and verifying its correctness. We introduce a verifier-based judge NoN with K generators, an instantiation of "best-of-K" or "judge-based" compound AI systems.
arXiv Detail & Related papers (2024-07-23T20:40:37Z)
Interactive Continual Learning: Fast and Slow Thinking [19.253164551254734]
This paper presents a novel Interactive Continual Learning framework, enabled by collaborative interactions among models of various sizes. To improve memory retrieval in System1, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution. Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods.
arXiv Detail & Related papers (2024-03-05T03:37:28Z)
Agent-OM: Leveraging LLM Agents for Ontology Matching [4.222245509121683]
This study introduces a novel agent-powered design paradigm for Ontology matching systems. We propose a framework, namely Agent-OMw.r.t. Agent for Ontology Matching, consisting of two Siamese agents for matching and retrieval. Our system can achieve results very close to the long-standing best performance on simple OM tasks.
arXiv Detail & Related papers (2023-12-01T03:44:54Z)
Continual Learning, Fast and Slow [75.53144246169346]
According to the Complementary Learning Systems theory, humans do effective emphcontinual learning through two complementary systems. We propose emphDualNets (for Dual Networks), a general continual learning framework comprising a fast learning system for supervised learning of specific tasks and a slow learning system for representation learning of task-agnostic general representation via Self-Supervised Learning (SSL) We demonstrate the promising results of DualNets on a wide range of continual learning protocols, ranging from the standard offline, task-aware setting to the challenging online, task-free scenario.
arXiv Detail & Related papers (2022-09-06T10:48:45Z)
Learning Physical Concepts in Cyber-Physical Systems: A Case Study [72.74318982275052]
We provide an overview of the current state of research regarding methods for learning physical concepts in time series data. We also analyze the most important methods from the current state of the art using the example of a three-tank system.
arXiv Detail & Related papers (2021-11-28T14:24:52Z)
DualNet: Continual Learning, Fast and Slow [14.902239050081032]
We propose a novel continual learning framework named "DualNet" It comprises a fast learning system for supervised learning of pattern-separated representation from specific tasks and a slow learning system for unsupervised representation learning of task-agnostic general representation via a Self-Supervised Learning (SSL) technique. Our experiments show that DualNet outperforms state-of-the-art continual learning methods by a large margin.
arXiv Detail & Related papers (2021-10-01T02:31:59Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.