Related papers: Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures

Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures

URL: http://arxiv.org/abs/2510.16968v1
Date: Sun, 19 Oct 2025 19:15:08 GMT
Title: Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures
Authors: Pingzhi Li, Morris Yu-Chao Huang, Zhen Tan, Qingquan Song, Jie Peng, Kai Zou, Yu Cheng, Kaidi Xu, Tianlong Chen,
Abstract summary: Knowledge Distillation (KD) accelerates training of large language models (LLMs) but poses intellectual property protection and diversity risks.<n>We present a KD detection framework effective in both white-box and black-box settings by exploiting an overlooked signal: the transfer of MoE "structural habits"<n>Our approach analyzes how different experts specialize and collaborate across various inputs, creating distinctive fingerprints that persist through the distillation process.
Score: 57.98221536489363
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge Distillation (KD) accelerates training of large language models (LLMs) but poses intellectual property protection and LLM diversity risks. Existing KD detection methods based on self-identity or output similarity can be easily evaded through prompt engineering. We present a KD detection framework effective in both white-box and black-box settings by exploiting an overlooked signal: the transfer of MoE "structural habits", especially internal routing patterns. Our approach analyzes how different experts specialize and collaborate across various inputs, creating distinctive fingerprints that persist through the distillation process. To extend beyond the white-box setup and MoE architectures, we further propose Shadow-MoE, a black-box method that constructs proxy MoE representations via auxiliary distillation to compare these patterns between arbitrary model pairs. We establish a comprehensive, reproducible benchmark that offers diverse distilled checkpoints and an extensible framework to facilitate future research. Extensive experiments demonstrate >94% detection accuracy across various scenarios and strong robustness to prompt-based evasion, outperforming existing baselines while highlighting the structural habits transfer in LLMs.

Related papers

Beyond Redundancy: Diverse and Specialized Multi-Expert Sparse Autoencoder [59.89996751196727]
Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting large language models.<n>SAEs' hidden layers have high dimensionality to satisfy sparsity constraints, resulting in prohibitive training and inference costs.<n>Recent Mixture of Experts (MoE) approaches attempt to address this by SAEs into narrower expert networks with gated activation.<n>We propose two key innovations: (1) Multiple Expert Activation that simultaneously engages semantically weighted expert subsets to encourage specialization, and (2) Feature Scaling that enhances diversity through adaptive high-frequency scaling.
arXiv Detail & Related papers (2025-11-07T22:19:34Z)
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time [35.31371938688921]
Test-Time Scaling (TTS) enhances the reasoning ability of large language models (LLMs) by allocating additional computation during inference.<n>We propose Dynamic Experts Search (DES), a TTS strategy that elevates expert activation into a controllable dimension of the search space.
arXiv Detail & Related papers (2025-09-26T16:49:10Z)
RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging [69.2230254959204]
We propose RouteMark, a framework for IP protection in merged MoE models.<n>Our key insight is that task-specific experts exhibit stable and distinctive routing behaviors under probing inputs.<n>For attribution and tampering detection, we introduce a similarity-based matching algorithm.
arXiv Detail & Related papers (2025-08-03T14:51:58Z)
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts [71.11994027685974]
We integrate the MoE structure into the classic Vision Transformer (ViT), naming it ViMoE, and explore the potential of applying MoE to vision. We observe that the performance is sensitive to the configuration of MoE layers, making it challenging to obtain optimal results without careful design. To address this, we introduce a shared expert to learn and capture common information, serving as an effective way to construct stable ViMoE.
arXiv Detail & Related papers (2024-10-21T07:51:17Z)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts [75.85448576746373]
We propose a method of grouping and pruning similar experts to improve the model's parameter efficiency.<n>We validate the effectiveness of our method by pruning three state-of-the-art MoE architectures.<n>The evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks.
arXiv Detail & Related papers (2024-07-12T17:25:02Z)
A Survey on Knowledge Distillation of Large Language Models [99.11900233108487]
Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities to open-source models. This paper presents a comprehensive survey of KD's role within the realm of Large Language Models (LLMs)
arXiv Detail & Related papers (2024-02-20T16:17:37Z)
MoDE: A Mixture-of-Experts Model with Mutual Distillation among the Experts [15.535613294871487]
We propose a method called Mixture-of-Distilled-Expert (MoDE) MoDE applies moderate mutual distillation among experts to enable each expert to pick up more features learned by other experts.
arXiv Detail & Related papers (2024-01-31T03:52:32Z)
Distilling Privileged Multimodal Information for Expression Recognition using Optimal Transport [46.91791643660991]
Deep learning models for multimodal expression recognition have reached remarkable performance in controlled laboratory environments. These models struggle in the wild because of the unavailability and quality of modalities used for training. In practice, only a subset of the training-time modalities may be available at test time. Learning with privileged information enables models to exploit data from additional modalities that are only available during training.
arXiv Detail & Related papers (2024-01-27T19:44:15Z)
Model-Agnostic Few-Shot Open-Set Recognition [36.97433312193586]
We tackle the Few-Shot Open-Set Recognition (FSOSR) problem. We focus on developing model-agnostic inference methods that can be plugged into any existing model. We introduce an Open Set Transductive Information Maximization method OSTIM.
arXiv Detail & Related papers (2022-06-18T16:27:59Z)
Hands-on Guidance for Distilling Object Detectors [11.856477599768773]
Our method, called Hands-on Guidance Distillation, distills the latent knowledge of all stage features for imposing more comprehensive supervision. We conduct extensive evaluations with different distillation configurations over VOC and COCO datasets, which show better performance on accuracy and speed trade-offs.
arXiv Detail & Related papers (2021-03-26T09:00:23Z)
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME) We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.