Synthetic Financial Data Generation for Enhanced Financial Modelling
- URL: http://arxiv.org/abs/2512.21791v1
- Date: Thu, 25 Dec 2025 21:43:16 GMT
- Title: Synthetic Financial Data Generation for Enhanced Financial Modelling
- Authors: Christophe D. Hounwanou, Yae Ulrich Gaba, Pierre Ntakirutimana,
- Abstract summary: This paper presents a unified multi-criteria evaluation framework for synthetic financial data.<n>Using historical S and P 500 daily data, we evaluate fidelity (Maximum Mean Discrepancy, MMD), temporal structure (autocorrelation and volatility clustering), and practical utility in downstream tasks.<n>We articulate practical guidelines for selecting generative models according to application needs and computational constraints.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data scarcity and confidentiality in finance often impede model development and robust testing. This paper presents a unified multi-criteria evaluation framework for synthetic financial data and applies it to three representative generative paradigms: the statistical ARIMA-GARCH baseline, Variational Autoencoders (VAEs), and Time-series Generative Adversarial Networks (TimeGAN). Using historical S and P 500 daily data, we evaluate fidelity (Maximum Mean Discrepancy, MMD), temporal structure (autocorrelation and volatility clustering), and practical utility in downstream tasks, specifically mean-variance portfolio optimization and volatility forecasting. Empirical results indicate that ARIMA-GARCH captures linear trends and conditional volatility but fails to reproduce nonlinear dynamics; VAEs produce smooth trajectories that underestimate extreme events; and TimeGAN achieves the best trade-off between realism and temporal coherence (e.g., TimeGAN attained the lowest MMD: 1.84e-3, average over 5 seeds). Finally, we articulate practical guidelines for selecting generative models according to application needs and computational constraints. Our unified evaluation protocol and reproducible codebase aim to standardize benchmarking in synthetic financial data research.
Related papers
- Applications of synthetic financial data in portfolio and risk modeling [0.0]
This paper examines the use of generative models for creating synthetic return series that support portfolio construction, trading analysis, and risk modeling.<n>TimeGAN produces synthetic data with distributional shapes, volatility patterns, and autocorrelation behaviour that are close to those observed in real returns.<n>The analysis supports the use of synthetic datasets as substitutes for real financial data in portfolio analysis and risk simulation.
arXiv Detail & Related papers (2025-12-25T22:28:32Z) - Real-Time Performance Analysis of Multi-Fidelity Residual Physics-Informed Neural Process-Based State Estimation for Robotic Systems [0.0]
We present a novel real-time, data-driven estimation approach based on the multi-fidelity residual physics-informed neural process (MFR-PINP)<n>Specifically, we address the model-mismatch issue of selecting an accurate kinematic model by tasking the MFR-PINP to also learn the residuals between simple, low-fidelity predictions and complex, high-fidelity ground-truth dynamics.<n>We provide implementation details of our MFR-PINP-based estimator for a hybrid online learning setting to validate our model's usage in real-time applications.
arXiv Detail & Related papers (2025-11-11T13:30:51Z) - Fiaingen: A financial time series generative method matching real-world data quality [2.333354880278056]
We introduce a set of novel techniques for time series data generation (we name them Fiaingen)<n>We assess their performance across three criteria: (a) overlap of real-world and synthetic data on a reduced dimensionality space, (b) performance on downstream machine learning tasks, and (c) runtime performance.<n>Experiments demonstrate that the methods achieve state-of-the-art performance across the three criteria listed above.
arXiv Detail & Related papers (2025-10-01T17:55:08Z) - Estimating Time Series Foundation Model Transferability via In-Context Learning [74.65355820906355]
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training.<n>Fine-tuning remains critical for boosting performance in domains with limited public data.<n>We introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem.
arXiv Detail & Related papers (2025-09-28T07:07:13Z) - Dynamic Lagging for Time-Series Forecasting in E-Commerce Finance: Mitigating Information Loss with A Hybrid ML Architecture [0.8192992814374568]
We propose a hybrid forecasting framework that integrates dynamic lagged feature engineering and adaptive rolling-window representations.<n>Our approach explicitly incorporates invoice-level behavioral modeling, structured lag of support data, and custom stability-aware loss functions.
arXiv Detail & Related papers (2025-09-24T15:33:16Z) - Continuous Visual Autoregressive Generation via Score Maximization [69.67438563485887]
We introduce a Continuous VAR framework that enables direct visual autoregressive generation without vector quantization.<n>Within this framework, all we need is to select a strictly proper score and set it as the training objective to optimize.
arXiv Detail & Related papers (2025-05-12T17:58:14Z) - Model Utility Law: Evaluating LLMs beyond Performance through Mechanism Interpretable Metric [99.56567010306807]
Large Language Models (LLMs) have become indispensable across academia, industry, and daily applications.<n>One core challenge of evaluation in the large language model (LLM) era is the generalization issue.<n>We propose Model Utilization Index (MUI), a mechanism interpretability enhanced metric that complements traditional performance scores.
arXiv Detail & Related papers (2025-04-10T04:09:47Z) - FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z) - Optimizing Sequential Recommendation Models with Scaling Laws and Approximate Entropy [104.48511402784763]
Performance Law for SR models aims to theoretically investigate and model the relationship between model performance and data quality.<n>We propose Approximate Entropy (ApEn) to assess data quality, presenting a more nuanced approach compared to traditional data quantity metrics.
arXiv Detail & Related papers (2024-11-30T10:56:30Z) - Deep Generative Modeling for Financial Time Series with Application in
VaR: A Comparative Review [22.52651841623703]
Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as the forecast distribution of risk factor returns in the next day.
HS, GARCH and CWGAN models are tested on both historical USD yield curve data and additional data simulated from GARCH and CIR processes.
The study shows that top performing models are HS, GARCH and CWGAN models.
arXiv Detail & Related papers (2024-01-18T20:35:32Z) - Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior.
Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map.
KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.