Privacy-Preserving Cohort Analytics for Personalized Health Platforms: A Differentially Private Framework with Stochastic Risk Modeling
- URL: http://arxiv.org/abs/2601.12105v1
- Date: Sat, 17 Jan 2026 16:59:41 GMT
- Title: Privacy-Preserving Cohort Analytics for Personalized Health Platforms: A Differentially Private Framework with Stochastic Risk Modeling
- Authors: Richik Chakraborty, Lawrence Liu, Syed Hasnain,
- Abstract summary: cohort-based aggregation of health data introduces nontrivial privacy risks.<n>We present a privacy-preserving cohort analytics framework that combines deterministic cohort constraints, differential privacy mechanisms, and synthetic baseline generation.<n>Our results suggest that risk modeling complements formal privacy guarantees by providing interpretable, decision-relevant metrics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Personalized health analytics increasingly rely on population benchmarks to provide contextual insights such as ''How do I compare to others like me?'' However, cohort-based aggregation of health data introduces nontrivial privacy risks, particularly in interactive and longitudinal digital platforms. Existing privacy frameworks such as $k$-anonymity and differential privacy provide essential but largely static guarantees that do not fully capture the cumulative, distributional, and tail-dominated nature of re-identification risk in deployed systems. In this work, we present a privacy-preserving cohort analytics framework that combines deterministic cohort constraints, differential privacy mechanisms, and synthetic baseline generation to enable personalized population comparisons while maintaining strong privacy protections. We further introduce a stochastic risk modeling approach that treats re-identification risk as a random variable evolving over time, enabling distributional evaluation through Monte Carlo simulation. Adapting quantitative risk measures from financial mathematics, we define Privacy Loss at Risk (P-VaR) to characterize worst-case privacy outcomes under realistic cohort dynamics and adversary assumptions. We validate our framework through system-level analysis and simulation experiments, demonstrating how privacy-utility tradeoffs can be operationalized for digital health platforms. Our results suggest that stochastic risk modeling complements formal privacy guarantees by providing interpretable, decision-relevant metrics for platform designers, regulators, and clinical informatics stakeholders.
Related papers
- Causal and Federated Multimodal Learning for Cardiovascular Risk Prediction under Heterogeneous Populations [0.0]
We create a single multimodal learning framework that integrates cross modal transformers with graph neural networks and causal representation learning to measure personalized CVD risk.<n>The model combines genomic variation, cardiac MRI, ECG waveforms, wearable streams, and structured EHR data to predict risk.<n>This study paves the way for a principled approach to clinically trustworthy, interpretable and privacy respecting CVD prediction at the population level.
arXiv Detail & Related papers (2026-01-05T14:32:49Z) - On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - From Mean to Extreme: Formal Differential Privacy Bounds on the Success of Real-World Data Reconstruction Attacks [54.25638567385662]
Differential Privacy in machine learning is often interpreted as guarantees against membership inference.<n> translating DP budgets into quantitative protection against the more damaging threat of data reconstruction remains a challenging open problem.<n>This paper bridges the critical gap by deriving the first formal privacy bounds tailored to the mechanics of demonstrated "from-scratch" attacks.
arXiv Detail & Related papers (2024-02-20T09:52:30Z) - Likelihood Ratio Confidence Sets for Sequential Decision Making [51.66638486226482]
We revisit the likelihood-based inference principle and propose to use likelihood ratios to construct valid confidence sequences.
Our method is especially suitable for problems with well-specified likelihoods.
We show how to provably choose the best sequence of estimators and shed light on connections to online convex optimization.
arXiv Detail & Related papers (2023-11-08T00:10:21Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - Simulation-based Bayesian Inference from Privacy Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.<n>We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - MACE: A Flexible Framework for Membership Privacy Estimation in
Generative Models [14.290199072565162]
We propose the first formal framework for membership privacy estimation in generative models.
Compared to previous works, our framework makes more realistic and flexible assumptions.
arXiv Detail & Related papers (2020-09-11T23:15:05Z) - Systematic Evaluation of Privacy Risks of Machine Learning Models [41.017707772150835]
We show that prior work on membership inference attacks may severely underestimate the privacy risks.
We first propose to benchmark membership inference privacy risks by improving existing non-neural network based inference attacks.
We then introduce a new approach for fine-grained privacy analysis by formulating and deriving a new metric called the privacy risk score.
arXiv Detail & Related papers (2020-03-24T00:53:53Z) - Differential Privacy at Risk: Bridging Randomness and Privacy Budget [5.393465689287103]
We analyse roles of the sources of randomness, namely the explicit randomness induced by the noise distribution and the implicit randomness induced by the data-generation distribution.
We propose privacy at risk that is a probabilistic calibration of privacy-preserving mechanisms.
We show that composition using the cost optimal privacy at risk provides stronger privacy guarantee than the classical advanced composition.
arXiv Detail & Related papers (2020-03-02T15:44:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.