Related papers: Learning from Diverse Reasoning Paths with Routing and Collaboration

Learning from Diverse Reasoning Paths with Routing and Collaboration

URL: http://arxiv.org/abs/2508.16861v1
Date: Sat, 23 Aug 2025 01:15:57 GMT
Title: Learning from Diverse Reasoning Paths with Routing and Collaboration
Authors: Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, Jundong Li,
Abstract summary: We propose Quality-filtered Routing with Cooperative Distillation (QR-Distill), combining path quality filtering, conditional routing, and peer teaching.<n> Experiments demonstrate QR-Distill's superiority over traditional single- and multi-path distillation methods.
Score: 65.77515749498575
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Advances in large language models (LLMs) significantly enhance reasoning capabilities but their deployment is restricted in resource-constrained scenarios. Knowledge distillation addresses this by transferring knowledge from powerful teacher models to compact and transparent students. However, effectively capturing the teacher's comprehensive reasoning is challenging due to conventional token-level supervision's limited scope. Using multiple reasoning paths per query alleviates this problem, but treating each path identically is suboptimal as paths vary widely in quality and suitability across tasks and models. We propose Quality-filtered Routing with Cooperative Distillation (QR-Distill), combining path quality filtering, conditional routing, and cooperative peer teaching. First, quality filtering retains only correct reasoning paths scored by an LLM-based evaluation. Second, conditional routing dynamically assigns paths tailored to each student's current learning state. Finally, cooperative peer teaching enables students to mutually distill diverse insights, addressing knowledge gaps and biases toward specific reasoning styles. Experiments demonstrate QR-Distill's superiority over traditional single- and multi-path distillation methods. Ablation studies further highlight the importance of each component including quality filtering, conditional routing, and peer teaching in effective knowledge transfer. Our code is available at https://github.com/LzyFischer/Distill.

Related papers

Automatic Question Generation for Intuitive Learning Utilizing Causal Graph Guided Chain of Thought Reasoning [8.587087233323038]
We propose a novel framework that combines causal-graph-guided Chain-of-Thought reasoning with a multi-agent language model.<n>This approach ensures the generation of accurate, meaningful, and curriculum-aligned questions.<n> Experimental results demonstrate up to a 70% improvement in quality compared to reference methods.
arXiv Detail & Related papers (2026-01-02T08:49:58Z)
AdaSwitch: Adaptive Switching Generation for Knowledge Distillation [58.647880811071495]
Small language models (SLMs) are crucial for applications with strict latency and computational constraints.<n>We propose AdaSwitch, a novel approach that combines on-policy and off-policy generation at the token level.<n>AdaSwitch consistently improves accuracy, offering a practical and effective method for distilling SLMs with acceptable additional overhead.
arXiv Detail & Related papers (2025-10-09T06:38:37Z)
Cross-View Consistency Regularisation for Knowledge Distillation [13.918476599394603]
This work is inspired by the success of cross-view learning in fields such as semi-supervised learning.<n>We introduce within-view and cross-view regularisations to standard logit-based distillation frameworks.<n>We also perform confidence-based soft label mining to improve the quality of distilling signals from the teacher.
arXiv Detail & Related papers (2024-12-21T05:41:47Z)
Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review [11.756344944226495]
We introduce a novel Fault-Aware DistIllation via Peer-Review (FAIR) approach.<n>Instead of merely obtaining rationales from teachers, our method asks teachers to identify and explain the student's mistakes.<n>Our method reduces the chance of teachers guessing correctly with flawed rationale.
arXiv Detail & Related papers (2024-10-04T17:59:41Z)
DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for In-Context Learning [66.85379279041128]
In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking to automatically select exemplars for in-context learning. DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%.
arXiv Detail & Related papers (2023-10-04T16:44:37Z)
MDFlow: Unsupervised Optical Flow Learning by Reliable Mutual Knowledge Distillation [12.249680550252327]
Current approaches impose an augmentation regularization term for continual self-supervision. We propose a novel mutual distillation framework to transfer reliable knowledge back and forth between the teacher and student networks. Our approach, termed MDFlow, achieves state-of-the-art real-time accuracy and generalization ability on challenging benchmarks.
arXiv Detail & Related papers (2022-11-11T05:56:46Z)
Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z)
Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance. For the first time in knowledge distillation, cross-stage connection paths are proposed. Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z)
Show, Attend and Distill:Knowledge Distillation via Attention-based Feature Matching [14.666392130118307]
Most studies manually tie intermediate features of the teacher and student, and transfer knowledge through pre-defined links. We introduce an effective and efficient feature distillation method utilizing all the feature levels of the teacher without manually selecting the links.
arXiv Detail & Related papers (2021-02-05T03:07:57Z)
Differentiable Feature Aggregation Search for Knowledge Distillation [47.94874193183427]
We introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework. DFA is a two-stage Differentiable Feature Aggregation search method motivated by DARTS in neural architecture search. Experimental results show that DFA outperforms existing methods on CIFAR-100 and CINIC-10 datasets.
arXiv Detail & Related papers (2020-08-02T15:42:29Z)
Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation. We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge. Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.