Differentially Private Model Compression
- URL: http://arxiv.org/abs/2206.01838v1
- Date: Fri, 3 Jun 2022 22:04:36 GMT
- Title: Differentially Private Model Compression
- Authors: Fatemehsadat Mireshghallah, Arturs Backurs, Huseyin A Inan, Lukas
Wutschitz, Janardhan Kulkarni
- Abstract summary: Large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models.
The inference cost of these models -- which consist of hundreds of millions of parameters -- can be prohibitively large.
We propose frameworks for achieving 50% sparsity levels while maintaining nearly full performance.
- Score: 21.97718614488461
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent papers have shown that large pre-trained language models (LLMs) such
as BERT, GPT-2 can be fine-tuned on private data to achieve performance
comparable to non-private models for many downstream Natural Language
Processing (NLP) tasks while simultaneously guaranteeing differential privacy.
The inference cost of these models -- which consist of hundreds of millions of
parameters -- however, can be prohibitively large. Hence, often in practice,
LLMs are compressed before they are deployed in specific applications. In this
paper, we initiate the study of differentially private model compression and
propose frameworks for achieving 50% sparsity levels while maintaining nearly
full performance. We demonstrate these ideas on standard GLUE benchmarks using
BERT models, setting benchmarks for future research on this topic.
Related papers
- The Cost of Compression: Investigating the Impact of Compression on
Parametric Knowledge in Language Models [11.156816338995503]
Large language models (LLMs) provide faster inference, smaller memory footprints, and enables local deployment.
Two standard compression techniques are pruning and quantization, with the former eliminating redundant connections in model layers and the latter representing model parameters with fewer bits.
Existing research on LLM compression primarily focuses on performance in terms of general metrics like perplexity or downstream task accuracy.
More fine-grained metrics, such as those measuring parametric knowledge, remain significantly underexplored.
arXiv Detail & Related papers (2023-12-01T22:27:12Z) - Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models [62.838689691468666]
We propose Federated Black-Box Prompt Tuning (Fed-BBPT) to optimally harness each local dataset.
Fed-BBPT capitalizes on a central server that aids local users in collaboratively training a prompt generator through regular aggregation.
Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines.
arXiv Detail & Related papers (2023-10-04T19:30:49Z) - Selective Pre-training for Private Fine-tuning [33.55628974557588]
We show that a careful pre-training on a public dataset is crucial to train small language models with differential privacy.
Results demonstrate that smaller models, through careful pre-training and private fine-tuning, can match the performance of much larger models that do not have access to private data.
arXiv Detail & Related papers (2023-05-23T09:36:58Z) - Pre-trained Language Models for Keyphrase Generation: A Thorough
Empirical Study [76.52997424694767]
We present an in-depth empirical study of keyphrase extraction and keyphrase generation using pre-trained language models.
We show that PLMs have competitive high-resource performance and state-of-the-art low-resource performance.
Further results show that in-domain BERT-like PLMs can be used to build strong and data-efficient keyphrase generation models.
arXiv Detail & Related papers (2022-12-20T13:20:21Z) - Federated Boosted Decision Trees with Differential Privacy [24.66980518231163]
We propose a general framework that captures and extends existing approaches for differentially private decision trees.
We show that with a careful choice of techniques it is possible to achieve very high utility while maintaining strong levels of privacy.
arXiv Detail & Related papers (2022-10-06T13:28:29Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks.
These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices.
We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z) - DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language
Models [152.29364079385635]
As pre-trained models grow bigger, the fine-tuning process can be time-consuming and computationally expensive.
We propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.
Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning and (ii) resource-efficient inference.
arXiv Detail & Related papers (2021-10-30T03:29:47Z) - Large Language Models Can Be Strong Differentially Private Learners [70.0317718115406]
Differentially Private (DP) learning has seen limited success for building large deep learning models of text.
We show that this performance drop can be mitigated with the use of large pretrained models.
We propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients.
arXiv Detail & Related papers (2021-10-12T01:45:27Z) - Direction is what you need: Improving Word Embedding Compression in
Large Language Models [7.736463504706344]
This paper presents a novel loss objective to compress token embeddings in Transformer-based models by leveraging an AutoEncoder architecture.
Our method significantly outperforms the commonly used SVD-based matrix-factorization approach in terms of initial language model Perplexity.
arXiv Detail & Related papers (2021-06-15T14:28:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.