Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment
- URL: http://arxiv.org/abs/2510.24208v1
- Date: Tue, 28 Oct 2025 09:25:40 GMT
- Title: Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment
- Authors: Jian Gu, Aldeida Aleti, Chunyang Chen, Hongyu Zhang,
- Abstract summary: Large Language Models (LLMs) encode vast amounts of knowledge in their massive parameters, which is accessible to locate, trace, and analyze.<n>Despite advances in neural interpretability, it is still not clear how to transfer knowledge in a fine-grained manner, namely parametric knowledge transfer (PKT)
- Score: 22.84428628659889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) encode vast amounts of knowledge in their massive parameters, which is accessible to locate, trace, and analyze. Despite advances in neural interpretability, it is still not clear how to transfer knowledge in a fine-grained manner, namely parametric knowledge transfer (PKT). A key problem is enabling effective and efficient knowledge transfer across LLMs of different scales, which is essential for achieving greater flexibility and broader applicability in transferring knowledge between LLMs. Due to neural incompatibility, referring to the architectural and parametric differences between LLMs of varying scales, existing methods that directly reuse layer parameters are severely limited. In this paper, we identify the semantic alignment in latent space as the fundamental prerequisite for LLM cross-scale knowledge transfer. Instead of directly using the layer parameters, our approach takes activations as the medium of layer-wise knowledge transfer. Leveraging the semantics in latent space, our approach is simple and outperforms prior work, better aligning model behaviors across varying scales. Evaluations on four benchmarks demonstrate the efficacy of our method. Further analysis reveals the key factors easing cross-scale knowledge transfer and provides insights into the nature of latent semantic alignment.
Related papers
- Leveraging Parameter Space Symmetries for Reasoning Skill Transfer in LLMs [27.978175136002005]
Task arithmetic is a powerful technique for transferring skills between Large Language Models (LLMs)<n>We first align the models' parameter spaces, leveraging the inherent permutation, rotation, and scaling symmetries of Transformer architectures.<n>We successfully transfer advanced reasoning skills to a non-reasoning model.
arXiv Detail & Related papers (2025-11-13T23:20:57Z) - Quantifying Dataset Similarity to Guide Transfer Learning [1.6328866317851185]
Cross-Learning Score ( CLS) measures dataset similarity through bidirectional performance between domains.<n> CLS can reliably predict whether transfer will improve or degrade performance.<n> CLS is efficient and fast to compute as it bypasses the problem of expensive distribution estimation for high-dimensional problems.
arXiv Detail & Related papers (2025-10-13T00:18:35Z) - TRAIL: Joint Inference and Refinement of Knowledge Graphs with Large Language Models [5.678291291711662]
TRAIL is a novel, unified framework for Thinking, Reasoning, And Incremental Learning.<n>It couples joint inference and dynamic KG refinement with large language models.<n>Extensive experiments on multiple benchmarks demonstrate that TRAIL outperforms existing KG-augmented and retrieval-augmented LLM baselines by 3% to 13%.
arXiv Detail & Related papers (2025-08-06T14:25:05Z) - Enhancing Cross-task Transfer of Large Language Models via Activation Steering [75.41750053623298]
Cross-task in-context learning offers a direct solution for transferring knowledge across tasks.<n>We investigate whether cross-task transfer can be achieved via latent space steering without parameter updates or input expansion.<n>We propose a novel Cross-task Activation Steering Transfer framework that enables effective transfer by manipulating the model's internal activation states.
arXiv Detail & Related papers (2025-07-17T15:47:22Z) - Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models [24.017656794423967]
Large Language Models offer a transparent brain with accessible parameters that encode extensive knowledge.<n>Key research challenge is to transcend traditional knowledge transfer paradigms rooted in symbolic language.<n> exploring effective methods for transferring knowledge across LLMs of different scales through parameters presents an intriguing and valuable research direction.
arXiv Detail & Related papers (2025-05-20T14:42:03Z) - Understanding the Role of Equivariance in Self-supervised Learning [51.56331245499712]
equivariant self-supervised learning (E-SSL) learns features to be augmentation-aware.
We identify a critical explaining-away effect in E-SSL that creates a synergy between the equivariant and classification tasks.
We reveal several principles for practical designs of E-SSL.
arXiv Detail & Related papers (2024-11-10T16:09:47Z) - MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.05167902805405]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models.<n>The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters.<n> MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models.
Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z) - Differentiable modeling to unify machine learning and physical models
and advance Geosciences [38.92849886903847]
We outline the concepts, applicability, and significance of differentiable geoscientific modeling (DG)
"Differentiable" refers to accurately and efficiently calculating gradients with respect to model variables.
Preliminary evidence suggests DG offers better interpretability and causality than Machine Learning.
arXiv Detail & Related papers (2023-01-10T15:24:14Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.