Related papers: Model-to-Model Knowledge Transmission (M2KT): A Data-Free Framework for Cross-Model Understanding Transfer

Model-to-Model Knowledge Transmission (M2KT): A Data-Free Framework for Cross-Model Understanding Transfer

URL: http://arxiv.org/abs/2511.17638v1
Date: Wed, 19 Nov 2025 09:43:25 GMT
Title: Model-to-Model Knowledge Transmission (M2KT): A Data-Free Framework for Cross-Model Understanding Transfer
Authors: Pratham Sorte,
Abstract summary: We introduce Model-to-Model Knowledge Transmission (M2KT), a novel paradigm for data-free conceptual transfer between neural networks.<n>Unlike classical distillation, M2KT operates primarily in concept space rather than example space.<n>M2KT can achieve approximately 85 to 90 percent of teacher performance while reducing data usage by over 98 percent compared to standard knowledge distillation.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern artificial intelligence systems depend heavily on large datasets for both training and transferring knowledge between models. Knowledge distillation, transfer learning, and dataset distillation have made such transfers more efficient, yet they remain fundamentally data-driven: a teacher must produce examples, logits, or gradients for a student to learn. In this work, we introduce Model-to-Model Knowledge Transmission (M2KT), a novel paradigm for data-free conceptual transfer between neural networks. M2KT enables models to exchange knowledge packets that encapsulate structured concept embeddings, abstraction graphs, reasoning traces, and provenance metadata. Unlike classical distillation, M2KT operates primarily in concept space rather than example space, and it does not require labeled datasets or teacher-generated outputs during transfer. We formalize the notion of concept manifolds, introduce an inter-model alignment mapping between teacher and student latent spaces, and derive a composite loss that enforces geometric, structural, and reasoning consistency together with explicit safety constraints. We further present algorithmic procedures for teacher-side packet generation and student-side ingestion and verification. Experiments on symbolic reasoning with large language models show that M2KT can achieve approximately 85 to 90 percent of teacher performance while reducing data usage by over 98 percent compared to standard knowledge distillation. This work establishes a theoretical and practical foundation for data-free AI-to-AI knowledge transfer and self-improving model ecosystems.

Related papers

Semi-Supervised Online Learning on the Edge by Transforming Knowledge from Teacher Models [1.6490670414281121]
Edge machine learning (Edge ML) enables training ML models using the vast data distributed across network edges.<n>Online Edge ML allows models to be trained directly on edge devices and updated continuously with new data.<n>We propose Knowledge Transformation (KT), a hybrid method combining Knowledge Distillation, Active Learning, and causal reasoning.
arXiv Detail & Related papers (2025-12-18T18:37:28Z)
PICKT: Practical Interlinked Concept Knowledge Tracing for Personalized Learning using Knowledge Map Concept Relations [2.449909275410288]
This paper focuses on the core technology of Knowledge Tracing models that analyze students' sequences of interactions to predict their knowledge acquisition levels.<n>A knowledge map structures the relationships among concepts considering the question and concept text information, thereby enabling effective knowledge tracing even in cold start situations.
arXiv Detail & Related papers (2025-12-08T05:24:17Z)
Seeing Further on the Shoulders of Giants: Knowledge Inheritance for Vision Foundation Models [54.517276878748305]
Vision foundation models (VFMs) are predominantly developed using data-centric methods.<n>Many open-source vision models have been pretrained on domain-specific data.<n>We present a new model-driven approach for training VFMs through joint knowledge transfer and preservation.
arXiv Detail & Related papers (2025-08-20T13:30:23Z)
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models [3.287942619833188]
We systematically evaluate the transferability of knowledge distillation from a Transformer teacher model to eight subquadratic student architectures.<n>Our study investigates which subquadratic model can most effectively approximate the teacher model's learned representations through knowledge distillation.
arXiv Detail & Related papers (2025-04-19T17:49:52Z)
SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model [64.92472567841105]
Knowledge Tracing (KT) aims to determine whether students will respond correctly to the next question. Structure-aware Inductive Knowledge Tracing model with large language model (dubbed SINKT) SINKT predicts the student's response to the target question by interacting with the student's knowledge state and the question representation.
arXiv Detail & Related papers (2024-07-01T12:44:52Z)
Self-Regulated Data-Free Knowledge Amalgamation for Text Classification [9.169836450935724]
We develop a lightweight student network that can learn from multiple teacher models without accessing their original training data. To accomplish this, we propose STRATANET, a modeling framework that produces text data tailored to each teacher. We evaluate our method on three benchmark text classification datasets with varying labels or domains.
arXiv Detail & Related papers (2024-06-16T21:13:30Z)
MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.05167902805405]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models.<n>The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters.<n> MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Directed Acyclic Graph Factorization Machines for CTR Prediction via Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation. KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z)
MOOCRep: A Unified Pre-trained Embedding of MOOC Entities [4.0963355240233446]
We propose to learn pre-trained representations of MOOC entities using abundant unlabeled data from the structure of MOOCs. Our experiments reveal that MOOCRep's embeddings outperform state-of-the-art representation learning methods on two tasks important for education community.
arXiv Detail & Related papers (2021-07-12T00:11:25Z)
Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications. We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network. Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.