Privacy-Aware Time Series Synthesis via Public Knowledge Distillation
- URL: http://arxiv.org/abs/2511.00700v1
- Date: Sat, 01 Nov 2025 20:44:24 GMT
- Title: Privacy-Aware Time Series Synthesis via Public Knowledge Distillation
- Authors: Penghang Liu, Haibei Zhu, Eleonora Kreacic, Svitlana Vyetrenko,
- Abstract summary: Pub2Priv is a novel framework for generating private time series data by leveraging heterogeneous public knowledge.<n>Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings.<n>We introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data.
- Score: 5.510212613486574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sharing sensitive time series data in domains such as finance, healthcare, and energy consumption, such as patient records or investment accounts, is often restricted due to privacy concerns. Privacy-aware synthetic time series generation addresses this challenge by enforcing noise during training, inherently introducing a trade-off between privacy and utility. In many cases, sensitive sequences is correlated with publicly available, non-sensitive contextual metadata (e.g., household electricity consumption may be influenced by weather conditions and electricity prices). However, existing privacy-aware data generation methods often overlook this opportunity, resulting in suboptimal privacy-utility trade-offs. In this paper, we present Pub2Priv, a novel framework for generating private time series data by leveraging heterogeneous public knowledge. Our model employs a self-attention mechanism to encode public data into temporal and feature embeddings, which serve as conditional inputs for a diffusion model to generate synthetic private sequences. Additionally, we introduce a practical metric to assess privacy by evaluating the identifiability of the synthetic data. Experimental results show that Pub2Priv consistently outperforms state-of-the-art benchmarks in improving the privacy-utility trade-off across finance, energy, and commodity trading domains.
Related papers
- How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z) - Aim High, Stay Private: Differentially Private Synthetic Data Enables Public Release of Behavioral Health Information with High Utility [2.1715431485081593]
Differential Privacy (DP) provides formal guarantees against re-identification risks.<n>We generate DP synthetic data for Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS)<n>We evaluate the utility of the synthetic data using a framework informed by actual uses of the LEMURS dataset.
arXiv Detail & Related papers (2025-06-30T15:58:34Z) - Differentially Private Synthetic Data Release for Topics API Outputs [63.79476766779742]
We focus on one Privacy-Preserving Ads API: the Topics API, part of Google Chrome's Privacy Sandbox.<n>We generate a differentially-private dataset that closely matches the re-identification risk properties of the real Topics API data.<n>We hope this will enable external researchers to analyze the API in-depth and replicate prior and future work on a realistic large-scale dataset.
arXiv Detail & Related papers (2025-06-30T13:46:57Z) - Synthetic Data Privacy Metrics [2.1213500139850017]
We review the pros and cons of popular metrics that include simulations of adversarial attacks.<n>We also review current best practices for amending generative models to enhance the privacy of the data they create.
arXiv Detail & Related papers (2025-01-07T17:02:33Z) - Differentially Private Publication of Electricity Time Series Data in Smart Grids [8.87717126222646]
Time-series of power consumption over geographical areas are valuable data sources to study consumer behavior and guide energy policy decisions.
However, publication of such data raises significant privacy issues, as it may reveal sensitive details about personal habits and lifestyles.
We introduce emT (S Private Timeseries), a novel method for DP-compliant publication of electricity consumption data.
arXiv Detail & Related papers (2024-08-24T23:30:09Z) - Defining 'Good': Evaluation Framework for Synthetic Smart Meter Data [14.779917834583577]
We show that standard privacy attack methods are inadequate for assessing privacy risks of smart meter datasets.
We propose an improved method by injecting training data with implausible outliers, then launching privacy attacks directly on these outliers.
arXiv Detail & Related papers (2024-07-16T14:41:27Z) - Synthetic Data: Revisiting the Privacy-Utility Trade-off [4.832355454351479]
An article stated that synthetic data does not provide a better trade-off between privacy and utility than traditional anonymization techniques.<n>We analyzed the implementation of the privacy game described in the article and found that it operated in a highly specialized and constrained environment.
arXiv Detail & Related papers (2024-07-09T14:48:43Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining [75.25943383604266]
We question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving.
We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy.
We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful.
arXiv Detail & Related papers (2022-12-13T10:41:12Z) - Synthetic Text Generation with Differential Privacy: A Simple and
Practical Recipe [32.63295550058343]
We show that a simple and practical recipe in the text domain is effective in generating useful synthetic text with strong privacy protection.
Our method produces synthetic text that is competitive in terms of utility with its non-private counterpart.
arXiv Detail & Related papers (2022-10-25T21:21:17Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.