Model-Based Privacy-Preserving Knowledge Transfer for Large Language Models
- URL: http://arxiv.org/abs/2410.10481v2
- Date: Fri, 14 Feb 2025 09:47:42 GMT
- Title: Model-Based Privacy-Preserving Knowledge Transfer for Large Language Models
- Authors: Zhaomin Wu, Jizhou Guo, Junyi Hou, Bingsheng He, Lixin Fan, Qiang Yang,
- Abstract summary: Llamdex is a framework that enhances large language models (LLMs) using only models trained on domain-specific data.
Our approach significantly enhances the accuracy of domain-specific tasks, achieving up to a 26% accuracy improvement.
- Score: 34.949731264918846
- License:
- Abstract: As large language models (LLMs) become more prevalent, effectively utilizing domain-specific knowledge while ensuring privacy has become critical. Existing methods often struggle to balance utility and privacy. For instance, retrieval-augmented generation (RAG) enables LLMs to access domain-specific knowledge but compromises the privacy of sensitive data. On the other hand, differentially private data synthesis techniques offer strong privacy guarantees but often result in poor utility. To address this challenge, we propose Llamdex, a novel framework that enhances LLMs using only models trained on domain-specific data, integrated into LLMs through carefully designed connection modules. Our approach significantly enhances the accuracy of domain-specific tasks, achieving up to a 26% accuracy improvement compared to state-of-the-art data synthesis methods under the same differential privacy constraints. Experimental results show that Llamdex not only improves the accuracy of LLM responses but also maintains comparable inference efficiency to the original LLM, highlighting its potential for real applications.
Related papers
- LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.
LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.
Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications [3.5839042822277585]
This paper presents a compact Large Language Models (LLMs) framework designed for local deployment in settings with strict privacy requirements.
We introduce a novel preprocessing technique that uses information extraction methods, e.g., regular expressions, to filter and emphasize critical information in clinical notes.
Our framework is evaluated using zero-shot and few-shot learning paradigms on both private and publicly available (MIMIC-IV) datasets.
arXiv Detail & Related papers (2024-12-03T22:06:55Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Federated Domain-Specific Knowledge Transfer on Large Language Models Using Synthetic Data [53.70870879858533]
We introduce a Federated Domain-specific Knowledge Transfer framework.
It enables domain-specific knowledge transfer from LLMs to SLMs while preserving clients' data privacy.
The proposed FDKT framework consistently and greatly improves SLMs' task performance by around 5% with a privacy budget of less than 10.
arXiv Detail & Related papers (2024-05-23T06:14:35Z) - A Framework for Cost-Effective and Self-Adaptive LLM Shaking and
Recovery Mechanism [33.330424243007265]
We introduce a cost-effective and self-adaptive LLM shaking tuning and recovery mechanism, named CypherTalk.
With carefully designed horizontal and vertical shaking operators, we can achieve comparable accuracy results with SOTA privacy-preserving LLM schemes.
Experiments also show that with the CypherTalk framework, users can achieve reliable accuracy when using optimized shaking operator settings.
arXiv Detail & Related papers (2024-03-12T03:30:04Z) - PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models.
One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets.
We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA)
Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z) - Differentially Private Low-Rank Adaptation of Large Language Model Using Federated Learning [32.52811740662061]
This article introduces DP-LoRA, a novel federated learning algorithm tailored for large language models (LLMs)
DP-LoRA preserves data privacy by employing a Gaussian mechanism that adds noise in weight updates, maintaining individual data privacy while facilitating collaborative model training.
arXiv Detail & Related papers (2023-12-29T06:50:38Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Augmented Large Language Models with Parametric Knowledge Guiding [72.71468058502228]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities.
Their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data.
We propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge.
arXiv Detail & Related papers (2023-05-08T15:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.