Related papers: Routing Distilled Knowledge via Mixture of LoRA Experts for Large Language Model based Bundle Generation

Routing Distilled Knowledge via Mixture of LoRA Experts for Large Language Model based Bundle Generation

URL: http://arxiv.org/abs/2508.17250v1
Date: Sun, 24 Aug 2025 08:19:51 GMT
Title: Routing Distilled Knowledge via Mixture of LoRA Experts for Large Language Model based Bundle Generation
Authors: Kaidong Feng, Zhu Sun, Hui Fang, Jie Yang, Wenyuan Liu, Yew-Soon Ong,
Abstract summary: RouteDK is a framework for routing distilled knowledge through a mixture of LoRA experts.<n>We first distill knowledge from the teacher LLM for bundle generation in two complementary types.<n>We then train knowledge-specific LoRA experts for each type of knowledge together with a base LoRA expert.
Score: 39.36438486578735
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have shown potential in automatic bundle generation but suffer from prohibitive computational costs. Although knowledge distillation offers a pathway to more efficient student models, our preliminary study reveals that naively integrating diverse types of distilled knowledge from teacher LLMs into student LLMs leads to knowledge conflict, negatively impacting the performance of bundle generation. To address this, we propose RouteDK, a framework for routing distilled knowledge through a mixture of LoRA expert architecture. Specifically, we first distill knowledge from the teacher LLM for bundle generation in two complementary types: high-level knowledge (generalizable rules) and fine-grained knowledge (session-specific reasoning). We then train knowledge-specific LoRA experts for each type of knowledge together with a base LoRA expert. For effective integration, we propose a dynamic fusion module, featuring an input-aware router, where the router balances expert contributions by dynamically determining optimal weights based on input, thereby effectively mitigating knowledge conflicts. To further improve inference reliability, we design an inference-time enhancement module to reduce variance and mitigate suboptimal reasoning. Experiments on three public datasets show that our RouteDK achieves accuracy comparable to or even better than the teacher LLM, while maintaining strong computational efficiency. In addition, it outperforms state-of-the-art approaches for bundle generation.

Related papers

Probing the Knowledge Boundary: An Interactive Agentic Framework for Deep Knowledge Extraction [29.717986496967978]
We propose an interactive agentic framework to systematically extract and quantify the knowledge of Large Language Models.<n>Our method includes four adaptive exploration policies to probe knowledge at different granularities.<n>We observe a clear knowledge scaling law, where larger models consistently extract more knowledge.
arXiv Detail & Related papers (2026-02-01T01:43:44Z)
FURINA: Free from Unmergeable Router via LINear Aggregation of mixed experts [17.056585698418587]
Mixture of Experts (MoE) has been successfully integrated into Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning.<n>A key limitation of existing MoE-LoRA methods is their reliance on a discrete router.<n>We propose FURINA, a novel Free from Unmergeable Router framework based on the LINear Aggregation of experts.
arXiv Detail & Related papers (2025-09-18T12:22:32Z)
Does Knowledge Distillation Matter for Large Language Model based Bundle Generation? [13.491190612749534]
Knowledge distillation offers a promising solution, transferring expertise from large teacher models to compact student models.<n>This study systematically investigates knowledge distillation approaches for bundle generation, aiming to minimize computational demands while preserving performance.<n>We propose a comprehensive KD framework that (i) progressively extracts knowledge (patterns, rules, deep thoughts); (ii) captures varying quantities of distilled knowledge through different strategies; and (iii) exploits complementary LLM adaptation techniques for domain-specific adaptation and enhanced efficiency.
arXiv Detail & Related papers (2025-04-24T03:18:16Z)
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models [61.96237184081951]
Low-Rank Adaptation (LoRA) is widely used to efficiently acquire specialized knowledge in Multimodal Large Language Models (MLLMs)<n>LoRA introduces substantial harmful redundancy during visual instruction tuning, which exacerbates the forgetting of general knowledge and degrades downstream task performance.<n>We propose LoRASculpt to eliminate harmful redundant parameters, thereby harmonizing general and specialized knowledge.
arXiv Detail & Related papers (2025-03-21T04:31:09Z)
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? [55.33467849079774]
Low-rank adaptation (LoRA) is a popular and efficient training technique for updating or domain-specific adaptation of Large Language Models.<n>We investigate how new facts can be incorporated into the LLM using LoRA without compromising the previously learned knowledge.
arXiv Detail & Related papers (2025-02-20T12:31:03Z)
Resolving Editing-Unlearning Conflicts: A Knowledge Codebook Framework for Large Language Model Updating [61.70705744491162]
Large Language Models (LLMs) excel in natural language processing by encoding extensive human knowledge.<n> Updating LLMs involves two key tasks simultaneously: unlearning to remove unwanted knowledge and editing to incorporate new information.<n>We propose LOKA, a conflict-free framework for LLM updating based on a knowledge codebook.
arXiv Detail & Related papers (2025-01-31T20:48:46Z)
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts [36.385301311200905]
Mixture-of-Experts (MoE) models address that by allowing the model size to grow without substantially raising training or inference costs.<n>MoE models face challenges regarding knowledge sharing among experts, making their performance somehow sensitive to routing accuracy.<n>In this paper, we propose CartesianMoE, which implements more effective knowledge sharing among experts in more like a multiplication'' manner.
arXiv Detail & Related papers (2024-10-21T14:55:59Z)
GIVE: Structured Reasoning of Large Language Models with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input.<n>GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak)
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
Efficient and Deployable Knowledge Infusion for Open-World Recommendations via Large Language Models [53.547190001324665]
We propose REKI to acquire two types of external knowledge about users and items from large language models (LLMs) We develop individual knowledge extraction and collective knowledge extraction tailored for different scales of scenarios, effectively reducing offline resource consumption. Experiments demonstrate that REKI outperforms state-of-the-art baselines and is compatible with lots of recommendation algorithms and tasks.
arXiv Detail & Related papers (2024-08-20T03:45:24Z)
Empowering Language Models with Knowledge Graph Reasoning for Question Answering [117.79170629640525]
We propose knOwledge REasOning empowered Language Model (OREO-LM) OREO-LM consists of a novel Knowledge Interaction Layer that can be flexibly plugged into existing Transformer-based LMs. We show significant performance gain, achieving state-of-art results in the Closed-Book setting.
arXiv Detail & Related papers (2022-11-15T18:26:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.