Fiaingen: A financial time series generative method matching real-world data quality
- URL: http://arxiv.org/abs/2510.01169v1
- Date: Wed, 01 Oct 2025 17:55:08 GMT
- Title: Fiaingen: A financial time series generative method matching real-world data quality
- Authors: Jože M. Rožanec, Tina Žezlin, Laurentiu Vasiliu, Dunja Mladenić, Radu Prodan, Dumitru Roman,
- Abstract summary: We introduce a set of novel techniques for time series data generation (we name them Fiaingen)<n>We assess their performance across three criteria: (a) overlap of real-world and synthetic data on a reduced dimensionality space, (b) performance on downstream machine learning tasks, and (c) runtime performance.<n>Experiments demonstrate that the methods achieve state-of-the-art performance across the three criteria listed above.
- Score: 2.333354880278056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data is vital in enabling machine learning models to advance research and practical applications in finance, where accurate and robust models are essential for investment and trading decision-making. However, real-world data is limited despite its quantity, quality, and variety. The data shortage of various financial assets directly hinders the performance of machine learning models designed to trade and invest in these assets. Generative methods can mitigate this shortage. In this paper, we introduce a set of novel techniques for time series data generation (we name them Fiaingen) and assess their performance across three criteria: (a) overlap of real-world and synthetic data on a reduced dimensionality space, (b) performance on downstream machine learning tasks, and (c) runtime performance. Our experiments demonstrate that the methods achieve state-of-the-art performance across the three criteria listed above. Synthetic data generated with Fiaingen methods more closely mirrors the original time series data while keeping data generation time close to seconds - ensuring the scalability of the proposed approach. Furthermore, models trained on it achieve performance close to those trained with real-world data.
Related papers
- Synthetic Financial Data Generation for Enhanced Financial Modelling [0.0]
This paper presents a unified multi-criteria evaluation framework for synthetic financial data.<n>Using historical S and P 500 daily data, we evaluate fidelity (Maximum Mean Discrepancy, MMD), temporal structure (autocorrelation and volatility clustering), and practical utility in downstream tasks.<n>We articulate practical guidelines for selecting generative models according to application needs and computational constraints.
arXiv Detail & Related papers (2025-12-25T21:43:16Z) - Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems.<n>Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs)<n>Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints.<n>This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z) - Towards Robust Universal Information Extraction: Benchmark, Evaluation, and Solution [66.11004226578771]
Existing robust benchmark datasets have two key limitations.<n>They generate only a limited range of perturbations for a single Information Extraction (IE) task.<n>Considering the powerful generation capabilities of Large Language Models (LLMs), we introduce a new benchmark dataset for Robust UIE, called RUIE-Bench.<n>We show that training with only textbf15% of the data leads to an average textbf7.5% relative performance improvement across three IE tasks.
arXiv Detail & Related papers (2025-03-05T05:39:29Z) - FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z) - Simulation as Reality? The Effectiveness of LLM-Generated Data in Open-ended Question Assessment [7.695222586877482]
This study investigates the potential and gap of simulative data to address the limitation of AI-based assessment tools.<n>Our findings reveal that while simulative data demonstrates promising results in training automated assessment models, its effectiveness has notable limitations.<n>The absence of real-world noise and biases, which are also present in over-processed real-world data, contributes to this limitation.
arXiv Detail & Related papers (2025-02-10T11:40:11Z) - From Machine Learning to Machine Unlearning: Complying with GDPR's Right to be Forgotten while Maintaining Business Value of Predictive Models [9.380866972744633]
This work develops a holistic machine learning-to-unlearning framework, called Ensemble-based iTerative Information Distillation (ETID)<n>ETID incorporates a new ensemble learning method to build an accurate predictive model that can facilitate handling data erasure requests.<n>We also introduce an innovative distillation-based unlearning method tailored to the constructed ensemble model to enable efficient and effective data erasure.
arXiv Detail & Related papers (2024-11-26T05:42:46Z) - Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data.
We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - STORM: Efficient Stochastic Transformer based World Models for
Reinforcement Learning [82.03481509373037]
Recently, model-based reinforcement learning algorithms have demonstrated remarkable efficacy in visual input environments.
We introduce Transformer-based wORld Model (STORM), an efficient world model architecture that combines strong modeling and generation capabilities.
Storm achieves a mean human performance of $126.7%$ on the Atari $100$k benchmark, setting a new record among state-of-the-art methods.
arXiv Detail & Related papers (2023-10-14T16:42:02Z) - On the Stability of Iterative Retraining of Generative Models on their own Data [56.153542044045224]
We study the impact of training generative models on mixed datasets.
We first prove the stability of iterative training under the condition that the initial generative models approximate the data distribution well enough.
We empirically validate our theory on both synthetic and natural images by iteratively training normalizing flows and state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-09-30T16:41:04Z) - Quantifying Quality of Class-Conditional Generative Models in
Time-Series Domain [4.219228636765818]
We introduce the InceptionTime Score (ITS) and the Frechet InceptionTime Distance (FITD) to gauge the qualitative performance of class conditional generative models on the time-series domain.
We conduct extensive experiments on 80 different datasets to study the discriminative capabilities of proposed metrics.
arXiv Detail & Related papers (2022-10-14T08:13:20Z) - Machine Learning for Temporal Data in Finance: Challenges and
Opportunities [0.0]
Temporal data are ubiquitous in the financial services (FS) industry.
But machine learning efforts often fail to account for the temporal richness of these data.
arXiv Detail & Related papers (2020-09-11T19:39:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.