Making Translators Privacy-aware on the User's Side
- URL: http://arxiv.org/abs/2312.04068v1
- Date: Thu, 7 Dec 2023 06:23:17 GMT
- Title: Making Translators Privacy-aware on the User's Side
- Authors: Ryoma Sato
- Abstract summary: We propose PRISM to enable users of machine translation systems to preserve the privacy of data on their own initiative.
PRISM adds privacy features without significantly compromising translation accuracy.
- Score: 17.912507269030577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose PRISM to enable users of machine translation systems to preserve
the privacy of data on their own initiative. There is a growing demand to apply
machine translation systems to data that require privacy protection. While
several machine translation engines claim to prioritize privacy, the extent and
specifics of such protection are largely ambiguous. First, there is often a
lack of clarity on how and to what degree the data is protected. Even if
service providers believe they have sufficient safeguards in place,
sophisticated adversaries might still extract sensitive information. Second,
vulnerabilities may exist outside of these protective measures, such as within
communication channels, potentially leading to data leakage. As a result, users
are hesitant to utilize machine translation engines for data demanding high
levels of privacy protection, thereby missing out on their benefits. PRISM
resolves this problem. Instead of relying on the translation service to keep
data safe, PRISM provides the means to protect data on the user's side. This
approach ensures that even machine translation engines with inadequate privacy
measures can be used securely. For platforms already equipped with privacy
safeguards, PRISM acts as an additional protection layer, reinforcing their
security furthermore. PRISM adds these privacy features without significantly
compromising translation accuracy. Our experiments demonstrate the
effectiveness of PRISM using real-world translators, T5 and ChatGPT
(GPT-3.5-turbo), and the datasets with two languages. PRISM effectively
balances privacy protection with translation accuracy.
Related papers
- Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - PrivacyRestore: Privacy-Preserving Inference in Large Language Models via Privacy Removal and Restoration [18.67432819687349]
We propose PrivacyRestore to protect the privacy of user inputs during Large Language Models inference.
PrivacyRestore directly removes privacy spans in user inputs and restores privacy information via activation steering during inference.
Experiments show that PrivacyRestore can protect private information while maintaining acceptable levels of performance and inference efficiency.
arXiv Detail & Related papers (2024-06-03T14:57:39Z) - InferDPT: Privacy-Preserving Inference for Black-box Large Language Model [66.07752875835506]
InferDPT is the first practical framework for the privacy-preserving Inference of black-box LLMs.
RANTEXT is a novel differential privacy mechanism integrated into the perturbation module of InferDPT.
arXiv Detail & Related papers (2023-10-18T18:00:11Z) - Can Language Models be Instructed to Protect Personal Information? [30.187731765653428]
We introduce PrivQA -- a benchmark to assess the privacy/utility trade-off when a model is instructed to protect specific categories of personal information in a simulated scenario.
We find that adversaries can easily circumvent these protections with simple jailbreaking methods through textual and/or image inputs.
We believe PrivQA has the potential to support the development of new models with improved privacy protections, as well as the adversarial robustness of these protections.
arXiv Detail & Related papers (2023-10-03T17:30:33Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - Privacy Explanations - A Means to End-User Trust [64.7066037969487]
We looked into how explainability might help to tackle this problem.
We created privacy explanations that aim to help to clarify to end users why and for what purposes specific data is required.
Our findings reveal that privacy explanations can be an important step towards increasing trust in software systems.
arXiv Detail & Related papers (2022-10-18T09:30:37Z) - An Example of Privacy and Data Protection Best Practices for Biometrics
Data Processing in Border Control: Lesson Learned from SMILE [0.9442139459221784]
Misuse of data, compromising the privacy of individuals and/or authorized processing of data may be irreversible.
This is partly due to the lack of methods and guidance for the integration of data protection and privacy by design in the system development process.
We present an example of privacy and data protection best practices to provide more guidance for data controllers and developers.
arXiv Detail & Related papers (2022-01-10T15:34:43Z) - Privacy in Open Search: A Review of Challenges and Solutions [0.6445605125467572]
Information retrieval (IR) is prone to privacy threats, such as attacks and unintended disclosures of documents and search history.
This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data.
arXiv Detail & Related papers (2021-10-20T18:38:48Z) - PrivEdge: From Local to Distributed Private Training and Prediction [43.02041269239928]
PrivEdge is a technique for privacy-preserving Machine Learning (ML)
PrivEdge safeguards the privacy of users who provide their data for training, as well as users who use the prediction service.
We show that PrivEdge has high precision and recall in preserving privacy, as well as in distinguishing between private and non-private images.
arXiv Detail & Related papers (2020-04-12T09:26:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.