Synergistic Weak-Strong Collaboration by Aligning Preferences
- URL: http://arxiv.org/abs/2504.15188v2
- Date: Tue, 22 Apr 2025 04:22:09 GMT
- Title: Synergistic Weak-Strong Collaboration by Aligning Preferences
- Authors: Yizhu Jiao, Xuchao Zhang, Zhaoyang Wang, Yubo Ma, Zhun Deng, Rujia Wang, Chetan Bansal, Saravan Rajmohan, Jiawei Han, Huaxiu Yao,
- Abstract summary: Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge.<n>We propose a collaborative framework that pairs a specialized weak model with a general strong model.<n>We find that the collaboration significantly outperforms each model alone by leveraging complementary strengths.
- Score: 53.47675666475273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current Large Language Models (LLMs) excel in general reasoning yet struggle with specialized tasks requiring proprietary or domain-specific knowledge. Fine-tuning large models for every niche application is often infeasible due to black-box constraints and high computational overhead. To address this, we propose a collaborative framework that pairs a specialized weak model with a general strong model. The weak model, tailored to specific domains, produces initial drafts and background information, while the strong model leverages its advanced reasoning to refine these drafts, extending LLMs' capabilities to critical yet specialized tasks. To optimize this collaboration, we introduce a collaborative feedback to fine-tunes the weak model, which quantifies the influence of the weak model's contributions in the collaboration procedure and establishes preference pairs to guide preference tuning of the weak model. We validate our framework through experiments on three domains. We find that the collaboration significantly outperforms each model alone by leveraging complementary strengths. Moreover, aligning the weak model with the collaborative preference further enhances overall performance.
Related papers
- Towards Harnessing the Collaborative Power of Large and Small Models for Domain Tasks [20.370633539861746]
Large language models (LLMs) have demonstrated remarkable capabilities, but they require vast amounts of data and computational resources.
In contrast, smaller models (SMs) can be more efficient and tailored to specific domains.
arXiv Detail & Related papers (2025-04-24T10:24:35Z) - Chimera: Improving Generalist Model with Domain-Specific Experts [35.706585190958634]
We introduce a scalable and low-cost multi-modal pipeline designed to boost the ability of existing LMMs with domain-specific experts.<n>Specifically, we design a progressive training strategy to integrate features from expert models into the input of a generalist LMM.<n>This results in a versatile model that excels across the chart, table, math, and document domains.
arXiv Detail & Related papers (2024-12-08T16:10:42Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Unconstrained Model Merging for Enhanced LLM Reasoning [42.079040543428036]
We explore the potential of merging multiple expert models into a single large language model.
We propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures.
Across 7 benchmarks and 9 reasoning-optimized LLMs, we reveal key findings that reasoning emerges from merging.
arXiv Detail & Related papers (2024-10-17T16:04:07Z) - Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities [4.389938747401259]
This work explores the effects of fine-tuning strategies on Large Language Models (LLMs) in domains such as materials science and engineering.
We find that the merging of multiple fine-tuned models can lead to the emergence of capabilities that surpass the individual contributions of the parent models.
arXiv Detail & Related papers (2024-09-05T11:49:53Z) - Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning [78.72226641279863]
Sparse Mixture of Expert (SMoE) models have emerged as a scalable alternative to dense models in language modeling.
Our research explores task-specific model pruning to inform decisions about designing SMoE architectures.
We introduce an adaptive task-aware pruning technique UNCURL to reduce the number of experts per MoE layer in an offline manner post-training.
arXiv Detail & Related papers (2024-09-02T22:35:03Z) - DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System [83.34921966305804]
Large language models (LLMs) have demonstrated remarkable performance in recommender systems.
We propose a novel plug-and-play alignment framework for LLMs and collaborative models.
Our method is superior to existing state-of-the-art algorithms.
arXiv Detail & Related papers (2024-08-15T15:56:23Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Minimal Value-Equivalent Partial Models for Scalable and Robust Planning
in Lifelong Reinforcement Learning [56.50123642237106]
Common practice in model-based reinforcement learning is to learn models that model every aspect of the agent's environment.
We argue that such models are not particularly well-suited for performing scalable and robust planning in lifelong reinforcement learning scenarios.
We propose new kinds of models that only model the relevant aspects of the environment, which we call "minimal value-minimal partial models"
arXiv Detail & Related papers (2023-01-24T16:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.