KCM: KAN-Based Collaboration Models Enhance Pretrained Large Models
- URL: http://arxiv.org/abs/2510.20278v1
- Date: Thu, 23 Oct 2025 07:06:21 GMT
- Title: KCM: KAN-Based Collaboration Models Enhance Pretrained Large Models
- Authors: Guangyu Dai, Siliang Tang, Yueting Zhuang,
- Abstract summary: We propose a KAN-based Collaborative Model (KCM) as an improved approach to large-small model collaboration.<n>KAN offers superior visualizability and interpretability while mitigating catastrophic forgetting.
- Score: 62.658961779827145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, Pretrained Large Models(PLMs) researchers proposed large-small model collaboration frameworks, leveraged easily trainable small models to assist large models, aim to(1) significantly reduce computational resource consumption while maintaining comparable accuracy, and (2) enhance large model performance in specialized domain tasks. However, this collaborative paradigm suffers from issues such as significant accuracy degradation, exacerbated catastrophic forgetting, and amplified hallucination problems induced by small model knowledge. To address these challenges, we propose a KAN-based Collaborative Model (KCM) as an improved approach to large-small model collaboration. The KAN utilized in KCM represents an alternative neural network architecture distinct from conventional MLPs. Compared to MLPs, KAN offers superior visualizability and interpretability while mitigating catastrophic forgetting. We deployed KCM in large-small model collaborative systems across three scenarios: language, vision, and vision-language cross-modal tasks. The experimental results demonstrate that, compared with pure large model approaches, the large-small model collaboration framework utilizing KCM as the collaborative model significantly reduces the number of large model inference calls while maintaining near-identical task accuracy, thereby substantially lowering computational resource consumption. Concurrently, the KAN-based small collaborative model markedly mitigates catastrophic forgetting, leading to significant accuracy improvements for long-tail data. The results reveal that KCM demonstrates superior performance across all metrics compared to MLP-based small collaborative models (MCM).
Related papers
- M-Loss: Quantifying Model Merging Compatibility with Limited Unlabeled Data [9.502531621979694]
We introduce Merging-ensembling loss (M-Loss), a novel evaluation metric.<n>M-Loss quantifies the compatibility of merging source models using very limited unlabeled data.<n>Our theoretical analysis and empirical evaluations demonstrate that incorporating M-Loss into the merging process significantly improves the alignment between merged models and model ensembling.
arXiv Detail & Related papers (2026-02-09T12:03:36Z) - MoCo: A One-Stop Shop for Model Collaboration Research [132.52160996841505]
We present MoCo: a one-stop Python library of executing, benchmarking, and comparing model collaboration algorithms at scale.<n>MoCo features 26 model collaboration methods, spanning diverse levels of cross-model information exchange.<n>Extensive experiments with MoCo demonstrate that most collaboration strategies outperform models without collaboration.<n>We envision MoCo as a valuable toolkit to facilitate and turbocharge the quest for an open, modular, decentralized, and collaborative AI future.
arXiv Detail & Related papers (2026-01-29T04:36:52Z) - The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models [54.51795784459866]
We propose a theoretical framework of performance scaling for multi-model collaboration.<n>We show that multi-model systems follow a power-law scaling with respect to the total parameter count.<n> ensembles of heterogeneous model families achieve better performance scaling than those formed within a single model family.
arXiv Detail & Related papers (2025-12-29T09:55:12Z) - MCP: A Control-Theoretic Orchestration Framework for Synergistic Efficiency and Interpretability in Multimodal Large Language Models [0.0]
This study proposes a three-layer collaboration framework based on model-controller-task adaptation (MCP)<n> Experiments show that the MCP framework improves the performance of cross-modal benchmarking tasks, such as GLUE, COCO, ScienceQA, etc., by 15-30% compared with the baseline model, improves the reasoning efficiency by 40%, and generates the interpretable intermediate results through the Presenter layer, obtaining 90% of the manual interpretability scores.
arXiv Detail & Related papers (2025-09-20T09:44:11Z) - Activation-Guided Consensus Merging for Large Language Models [25.68958388022476]
We present textbfActivation-Guided textbfConsensus textbfMerging (textbfACM), a plug-and-play merging framework that determines layer-specific merging coefficients.<n>Experiments on Long-to-Short (L2S) and general merging tasks demonstrate that ACM consistently outperforms all baseline methods.
arXiv Detail & Related papers (2025-05-20T07:04:01Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - Co-training and Co-distillation for Quality Improvement and Compression
of Language Models [88.94539115180919]
Knowledge Distillation (KD) compresses expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models.
Most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed.
We propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models.
arXiv Detail & Related papers (2023-11-06T03:29:00Z) - Domain Generalization Using Large Pretrained Models with Mixture-of-Adapters [33.401355417911084]
This study is on leveraging the knowledge of large pretrained models to improve handling of OOD scenarios and tackle domain generalization problems.<n>We employ parameter-efficient fine-tuning (PEFT) techniques to effectively preserve OOD robustness while working with large models.<n>Our experiments and analysis confirm that the most effective approaches involve ensembling diverse models and increasing the scale of pretraining.
arXiv Detail & Related papers (2023-10-17T07:01:24Z) - Sample-Efficient Reinforcement Learning via Conservative Model-Based
Actor-Critic [67.00475077281212]
Model-based reinforcement learning algorithms are more sample efficient than their model-free counterparts.
We propose a novel approach that achieves high sample efficiency without the strong reliance on accurate learned models.
We show that CMBAC significantly outperforms state-of-the-art approaches in terms of sample efficiency on several challenging tasks.
arXiv Detail & Related papers (2021-12-16T15:33:11Z) - MixKD: Towards Efficient Distillation of Large-scale Language Models [129.73786264834894]
We propose MixKD, a data-agnostic distillation framework, to endow the resulting model with stronger generalization ability.
We prove from a theoretical perspective that under reasonable conditions MixKD gives rise to a smaller gap between the error and the empirical error.
Experiments under a limited-data setting and ablation studies further demonstrate the advantages of the proposed approach.
arXiv Detail & Related papers (2020-11-01T18:47:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.