Synopsis: Secure and private trend inference from encrypted semantic embeddings
- URL: http://arxiv.org/abs/2505.23880v1
- Date: Thu, 29 May 2025 17:34:10 GMT
- Title: Synopsis: Secure and private trend inference from encrypted semantic embeddings
- Authors: Madelyne Xiao, Palak Jain, Micha Gorelick, Sarah Scheffler,
- Abstract summary: We introduce Synopsis, a secure architecture for analyzing messaging trends in consensually-donated E2EE messages using message embeddings.<n>Since the goal of this system is investigative journalism, Synopsis must facilitate both exploratory and targeted analyses.<n> Evaluations on a dataset of Hindi-language WhatsApp messages demonstrate the efficiency and accuracy of our approach.
- Score: 2.7998963147546148
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: WhatsApp and many other commonly used communication platforms guarantee end-to-end encryption (E2EE), which requires that service providers lack the cryptographic keys to read communications on their own platforms. WhatsApp's privacy-preserving design makes it difficult to study important phenomena like the spread of misinformation or political messaging, as users have a clear expectation and desire for privacy and little incentive to forfeit that privacy in the process of handing over raw data to researchers, journalists, or other parties. We introduce Synopsis, a secure architecture for analyzing messaging trends in consensually-donated E2EE messages using message embeddings. Since the goal of this system is investigative journalism workflows, Synopsis must facilitate both exploratory and targeted analyses -- a challenge for systems using differential privacy (DP), and, for different reasons, a challenge for private computation approaches based on cryptography. To meet these challenges, we combine techniques from the local and central DP models and wrap the system in malicious-secure multi-party computation to ensure the DP query architecture is the only way to access messages, preventing any party from directly viewing stored message embeddings. Evaluations on a dataset of Hindi-language WhatsApp messages (34,024 messages represented as 500-dimensional embeddings) demonstrate the efficiency and accuracy of our approach. Queries on this data run in about 30 seconds, and the accuracy of the fine-grained interface exceeds 94% on benchmark tasks.
Related papers
- MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z) - User Perceptions and Attitudes Toward Untraceability in Messaging Platforms [4.724825031148412]
We study user perceptions of "untraceability", i.e., preventing third parties from tracing who communicates with whom.<n>We find that users associate the concept of untraceability with a wide range of privacy enhancing technologies.<n>We discuss the gap between users' perceptions of untraceability and the threat models addressed by untraceable communication protocols.
arXiv Detail & Related papers (2025-06-12T18:19:50Z) - Metadata-private Messaging without Coordination [20.481776420813915]
PingPong is an end-to-end system for metadata-private messaging.<n>It replaces the rigid "dial-before-converse" paradigm with a more flexible "notify-before-retrieval" workflow.<n>Pong achieves a level of usability akin to modern instant messaging systems.
arXiv Detail & Related papers (2025-04-28T08:21:16Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [56.46355425175232]
We suggest sanitizing sensitive text using two common strategies used by humans.<n>We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.<n>Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - IDPFilter: Mitigating Interdependent Privacy Issues in Third-Party Apps [0.30693357740321775]
Third-party apps have increased concerns about interdependent privacy (IDP)
This paper provides a comprehensive investigation into the previously underinvestigated IDP issues of third-party apps.
We propose IDPFilter, a platform-agnostic API that enables application providers to minimize collateral information collection.
arXiv Detail & Related papers (2024-05-02T16:02:13Z) - Boosting Digital Safeguards: Blending Cryptography and Steganography [0.30783046172997025]
Steganography involves hiding data within another medium, thereby facilitating covert communication by making the message invisible.
This proposed approach takes advantage of the latest advancements in Artificial Intelligence (AI) and Deep Learning (DL), especially through the application of Generative Adversarial Networks (GANs)
The application of GANs enables a smart, secure system that utilizes the inherent sensitivity of neural networks to slight alterations in data.
arXiv Detail & Related papers (2024-04-09T03:36:39Z) - EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs [34.77734655124251]
EmojiPrompt performs generative transformation, obfuscating private data within prompts with linguistic and non-linguistic elements.<n>We evaluate EmojiPrompt's performance across 8 datasets from various domains.<n>EmojiPrompt's atomic-level obfuscation allows it to function exclusively with cloud-based LLMs.
arXiv Detail & Related papers (2024-02-08T17:57:11Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - "Am I Private and If So, how Many?" -- Using Risk Communication Formats
for Making Differential Privacy Understandable [0.0]
We adapt risk communication formats in conjunction with a model for the privacy risks of Differential Privacy.
We evaluate these novel privacy communication formats in a crowdsourced study.
arXiv Detail & Related papers (2022-04-08T13:30:07Z) - Differentially Private Multi-Agent Planning for Logistic-like Problems [70.3758644421664]
This paper proposes a novel strong privacy-preserving planning approach for logistic-like problems.
Two challenges are addressed: 1) simultaneously achieving strong privacy, completeness and efficiency, and 2) addressing communication constraints.
To the best of our knowledge, this paper is the first to apply differential privacy to the field of multi-agent planning.
arXiv Detail & Related papers (2020-08-16T03:43:09Z) - BeeTrace: A Unified Platform for Secure Contact Tracing that Breaks Data
Silos [73.84437456144994]
Contact tracing is an important method to control the spread of an infectious disease such as COVID-19.
Current solutions do not utilize the huge volume of data stored in business databases and individual digital devices.
We propose BeeTrace, a unified platform that breaks data silos and deploys state-of-the-art cryptographic protocols to guarantee privacy goals.
arXiv Detail & Related papers (2020-07-05T10:33:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.