Related papers: Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

URL: http://arxiv.org/abs/2310.11451v2
Date: Wed, 8 May 2024 12:11:00 GMT
Title: Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective
Authors: Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He,
Abstract summary: We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
Score: 106.92016199403042
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) inherently encode a wealth of knowledge within their parameters through pre-training on extensive corpora. While prior research has delved into operations on these parameters to manipulate the underlying implicit knowledge (encompassing detection, editing, and merging), there remains an ambiguous understanding regarding their transferability across models with varying scales. In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective. To achieve this, we employ sensitivity-based techniques to extract and align knowledge-specific parameters between different LLMs. Moreover, the LoRA module is used as the intermediary mechanism for injecting the extracted knowledge into smaller models. Evaluations across four benchmarks validate the efficacy of our proposed method. Our findings highlight the critical factors contributing to the process of parametric knowledge transfer, underscoring the transferability of model parameters across LLMs of different scales. Project website: https://maszhongming.github.io/ParaKnowTransfer.

Related papers

The Rise of Parameter Specialization for Knowledge Storage in Large Language Models [50.91855620712756]
We show that as language models become more advanced, their parameters exhibit increased specialization.<n>We experimentally validate that this specialized distribution of knowledge contributes to improving the efficiency of knowledge utilization in these models.
arXiv Detail & Related papers (2025-05-22T20:15:01Z)
Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models [24.017656794423967]
Large Language Models offer a transparent brain with accessible parameters that encode extensive knowledge.<n>Key research challenge is to transcend traditional knowledge transfer paradigms rooted in symbolic language.<n> exploring effective methods for transferring knowledge across LLMs of different scales through parameters presents an intriguing and valuable research direction.
arXiv Detail & Related papers (2025-05-20T14:42:03Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
When Context Leads but Parametric Memory Follows in Large Language Models [4.567122178196834]
Large language models (LLMs) have demonstrated remarkable progress in leveraging diverse knowledge sources. This study investigates how nine widely used LLMs allocate knowledge between local context and global parameters when answering open-ended questions.
arXiv Detail & Related papers (2024-09-13T00:03:19Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
MergeNet: Knowledge Migration across Heterogeneous Models, Tasks, and Modalities [72.68829963458408]
We present MergeNet, which learns to bridge the gap of parameter spaces of heterogeneous models. The core mechanism of MergeNet lies in the parameter adapter, which operates by querying the source model's low-rank parameters. MergeNet is learned alongside both models, allowing our framework to dynamically transfer and adapt knowledge relevant to the current stage.
arXiv Detail & Related papers (2024-04-20T08:34:39Z)
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression [64.07696663255155]
Large-scale pre-trained language models (LLMs) have demonstrated exceptional performance in various natural language processing (NLP) tasks. However, the massive size of these models poses huge challenges for their deployment in real-world applications. We introduce a novel compression paradigm called Retrieval-based Knowledge Transfer (RetriKT) which effectively transfers the knowledge of LLMs to extremely small-scale models.
arXiv Detail & Related papers (2023-10-24T07:58:20Z)
Retrieval-Augmented Meta Learning for Low-Resource Text Classification [22.653220906899612]
We propose a meta-learning based method called Retrieval-Augmented Meta Learning(RAML) It uses parameterization for inference but also retrieves non-parametric knowledge from an external corpus to make inferences. RAML significantly outperforms current SOTA low-resource text classification models.
arXiv Detail & Related papers (2023-09-10T10:05:03Z)
Beyond Convergence: Identifiability of Machine Learning and Deep Learning Models [0.0]
We investigate the notion of model parameter identifiability through a case study focused on parameter estimation from motion sensor data. We employ a deep neural network to estimate subject-wise parameters, including mass, stiffness, and equilibrium leg length. The results show that while certain parameters can be identified from the observation data, others remain unidentifiable.
arXiv Detail & Related papers (2023-07-21T03:40:53Z)
Meta Knowledge Condensation for Federated Learning [65.20774786251683]
Existing federated learning paradigms usually extensively exchange distributed models at a central solver to achieve a more powerful model. This would incur severe communication burden between a server and multiple clients especially when data distributions are heterogeneous. Unlike existing paradigms, we introduce an alternative perspective to significantly decrease the communication cost in federate learning.
arXiv Detail & Related papers (2022-09-29T15:07:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.