Privacy enabled Financial Text Classification using Differential Privacy
and Federated Learning
- URL: http://arxiv.org/abs/2110.01643v1
- Date: Mon, 4 Oct 2021 18:15:32 GMT
- Title: Privacy enabled Financial Text Classification using Differential Privacy
and Federated Learning
- Authors: Priyam Basu, Tiasa Singha Roy, Rakshit Naidu, Zumrut Muftuoglu
- Abstract summary: We propose a contextualized text classification model integrated with privacy features such as Differential Privacy (DP) and Federated Learning (FL)
We present how to privately train NLP models and desirable privacy-utility tradeoffs and evaluate them on the Financial Phrase Bank dataset.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy is important considering the financial Domain as such data is highly
confidential and sensitive. Natural Language Processing (NLP) techniques can be
applied for text classification and entity detection purposes in financial
domains such as customer feedback sentiment analysis, invoice entity detection,
categorisation of financial documents by type etc. Due to the sensitive nature
of such data, privacy measures need to be taken for handling and training large
models with such data. In this work, we propose a contextualized transformer
(BERT and RoBERTa) based text classification model integrated with privacy
features such as Differential Privacy (DP) and Federated Learning (FL). We
present how to privately train NLP models and desirable privacy-utility
tradeoffs and evaluate them on the Financial Phrase Bank dataset.
Related papers
- Differential Privacy Overview and Fundamental Techniques [63.0409690498569]
This chapter is meant to be part of the book "Differential Privacy in Artificial Intelligence: From Theory to Practice"
It starts by illustrating various attempts to protect data privacy, emphasizing where and why they failed.
It then defines the key actors, tasks, and scopes that make up the domain of privacy-preserving data analysis.
arXiv Detail & Related papers (2024-11-07T13:52:11Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Differentially Private Language Models for Secure Data Sharing [19.918137395199224]
In this paper, we show how to train a generative language model in a differentially private manner and consequently sampling data from it.
Using natural language prompts and a new prompt-mismatch loss, we are able to create highly accurate and fluent textual datasets.
We perform thorough experiments indicating that our synthetic datasets do not leak information from our original data and are of high language quality.
arXiv Detail & Related papers (2022-10-25T11:12:56Z) - Sotto Voce: Federated Speech Recognition with Differential Privacy
Guarantees [0.761963751158349]
Speech data is expensive to collect, and incredibly sensitive to its sources.
It is often the case that organizations independently collect small datasets for their own use, but often these are not performant for the demands of machine learning.
Organizations could pool these datasets together and jointly build a strong ASR system; sharing data in the clear, however, comes with tremendous risk, in terms of intellectual property loss as well as loss of privacy of the individuals who exist in the dataset.
arXiv Detail & Related papers (2022-07-16T02:48:54Z) - How to keep text private? A systematic review of deep learning methods
for privacy-preserving natural language processing [0.38073142980732994]
Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020.
We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods.
We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
arXiv Detail & Related papers (2022-05-20T11:29:44Z) - SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors.
A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z) - Personalized PATE: Differential Privacy for Machine Learning with
Individual Privacy Guarantees [1.2691047660244335]
We propose three novel methods to support training an ML model with different personalized privacy guarantees within the training data.
Our experiments show that our personalized privacy methods yield higher accuracy models than the non-personalized baseline.
arXiv Detail & Related papers (2022-02-21T20:16:27Z) - Distributed Machine Learning and the Semblance of Trust [66.1227776348216]
Federated Learning (FL) allows the data owner to maintain data governance and perform model training locally without having to share their data.
FL and related techniques are often described as privacy-preserving.
We explain why this term is not appropriate and outline the risks associated with over-reliance on protocols that were not designed with formal definitions of privacy in mind.
arXiv Detail & Related papers (2021-12-21T08:44:05Z) - PCAL: A Privacy-preserving Intelligent Credit Risk Modeling Framework
Based on Adversarial Learning [111.19576084222345]
This paper proposes a framework of Privacy-preserving Credit risk modeling based on Adversarial Learning (PCAL)
PCAL aims to mask the private information inside the original dataset, while maintaining the important utility information for the target prediction task performance.
Results indicate that PCAL can learn an effective, privacy-free representation from user data, providing a solid foundation towards privacy-preserving machine learning for credit risk analysis.
arXiv Detail & Related papers (2020-10-06T07:04:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.