Related papers: Generating Accurate Synthetic Survival Data by Conditioning on Outcomes

Related papers

SurvDiff: A Diffusion Model for Generating Synthetic Data in Survival Analysis [34.89334607334426]
Survival analysis is a cornerstone of clinical research by modeling time-to-event outcomes such as metastasis, disease relapse, or patient death.<n>SurvDiff is an end-to-end diffusion model specifically designed for generating synthetic data in survival analysis.<n>We show that survdiff consistently outperforms state-of-the-art generative baselines in both distributional fidelity and downstream evaluation metrics across multiple medical datasets.
arXiv Detail & Related papers (2025-09-26T13:50:29Z)
SynDelay: A Synthetic Dataset for Delivery Delay Prediction [50.56729406793283]
We present SynDelay, a synthetic dataset designed for delivery delay prediction.<n>It is publicly available through the Supply Chain Data Hub, an open initiative promoting dataset sharing and benchmarking in supply chain AI.
arXiv Detail & Related papers (2025-08-30T21:54:37Z)
Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation [7.240170769827935]
Synthetic data generation has become essential for scalable, privacy-preserving statistical analysis.<n>We propose a Latent Noise Injection method using Masked Autoregressive Flows (MAF)<n>Instead of directly sampling from the trained model, our method perturbs each data point in the latent space and maps it back to the data domain.
arXiv Detail & Related papers (2025-06-19T22:22:57Z)
Beyond the Norm: A Survey of Synthetic Data Generation for Rare Events [5.619671817895425]
Extreme events, such as market crashes, natural disasters, and pandemics, are rare but catastrophic.<n>While data-driven methods offer powerful capabilities for extreme event modeling, they require abundant training data, yet extreme event data is inherently scarce.<n>This survey provides the first overview of synthetic data generation for extreme events.
arXiv Detail & Related papers (2025-06-04T20:21:23Z)
Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server. We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z)
Multi-modal Data Binding for Survival Analysis Modeling with Incomplete Data and Annotations [19.560652381770243]
We introduce a novel framework that simultaneously handles incomplete data across modalities and censored survival labels. Our approach employs advanced foundation models to encode individual modalities and align them into a universal representation space. The proposed method demonstrates outstanding prediction accuracy in two survival analysis tasks on both employed datasets.
arXiv Detail & Related papers (2024-07-25T02:55:39Z)
A Temporally Disentangled Contrastive Diffusion Model for Spatiotemporal Imputation [35.46631415365955]
We introduce a conditional diffusion framework called C$2$TSD, which incorporates disentangled temporal (trend and seasonality) representations as conditional information. Our experiments on three real-world datasets demonstrate the superior performance of our approach compared to a number of state-of-the-art baselines.
arXiv Detail & Related papers (2024-02-18T11:59:04Z)
TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis [15.496918127515665]
We propose a time-adaptive coordinate loss function, TripleSurv, to handle the complexities of learning process and exploit valuable survival time values. Our TripleSurv is evaluated on three real-world survival datasets and a public synthetic dataset.
arXiv Detail & Related papers (2024-01-05T08:37:57Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
CenTime: Event-Conditional Modelling of Censoring in Survival Analysis [49.44664144472712]
We introduce CenTime, a novel approach to survival analysis that directly estimates the time to event. Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce. Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance.
arXiv Detail & Related papers (2023-09-07T17:07:33Z)
Copula-Based Deep Survival Models for Dependent Censoring [10.962520289040336]
This paper presents a parametric model of survival that extends modern non-linear survival analysis by relaxing the assumption of conditional independence. On synthetic and semi-synthetic data, our approach significantly improves estimates of survival distributions compared to the standard that assumes conditional independence in the data.
arXiv Detail & Related papers (2023-06-20T21:51:13Z)
SurvivalGAN: Generating Time-to-Event Data for Survival Analysis [121.84429525403694]
Imbalances in censoring and time horizons cause generative models to experience three new failure modes specific to survival analysis. We propose SurvivalGAN, a generative model that handles survival data by addressing the imbalance in the censoring and event horizons. We evaluate this method via extensive experiments on medical datasets.
arXiv Detail & Related papers (2023-02-24T17:03:51Z)
Membership Inference Attacks against Synthetic Data through Overfitting Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution. We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z)
Breaking the Spurious Causality of Conditional Generation via Fairness Intervention with Corrective Sampling [77.15766509677348]
Conditional generative models often inherit spurious correlations from the training dataset. This can result in label-conditional distributions that are imbalanced with respect to another latent attribute. We propose a general two-step strategy to mitigate this issue.
arXiv Detail & Related papers (2022-12-05T08:09:33Z)
Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z)
Hide-and-Seek Privacy Challenge [88.49671206936259]
The NeurIPS 2020 Hide-and-Seek Privacy Challenge is a novel two-tracked competition to accelerate progress in tackling both problems. In our head-to-head format, participants in the synthetic data generation track (i.e. "hiders") and the patient re-identification track (i.e. "seekers") are directly pitted against each other by way of a new, high-quality intensive care time-series dataset.
arXiv Detail & Related papers (2020-07-23T15:50:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.