Related papers: History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis

History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis

URL: http://arxiv.org/abs/2601.10143v1
Date: Thu, 15 Jan 2026 07:38:59 GMT
Title: History Is Not Enough: An Adaptive Dataflow System for Financial Time-Series Synthesis
Authors: Haochong Xia, Yao Long Teng, Regan Tan, Molei Qin, Xinrun Wang, Bo An,
Abstract summary: "History Is Not Enough" underscores the need for adaptive data generation that learns to evolve with the market.<n>We present a drift-aware dataflow system that integrates machine learning-based adaptive control into the data curation process.
Score: 25.486090554711797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In quantitative finance, the gap between training and real-world performance-driven by concept drift and distributional non-stationarity-remains a critical obstacle for building reliable data-driven systems. Models trained on static historical data often overfit, resulting in poor generalization in dynamic markets. The mantra "History Is Not Enough" underscores the need for adaptive data generation that learns to evolve with the market rather than relying solely on past observations. We present a drift-aware dataflow system that integrates machine learning-based adaptive control into the data curation process. The system couples a parameterized data manipulation module comprising single-stock transformations, multi-stock mix-ups, and curation operations, with an adaptive planner-scheduler that employs gradient-based bi-level optimization to control the system. This design unifies data augmentation, curriculum learning, and data workflow management under a single differentiable framework, enabling provenance-aware replay and continuous data quality monitoring. Extensive experiments on forecasting and reinforcement learning trading tasks demonstrate that our framework enhances model robustness and improves risk-adjusted returns. The system provides a generalizable approach to adaptive data management and learning-guided workflow automation for financial data.

Related papers

Data Science and Technology Towards AGI Part I: Tiered Data Management [53.64581824953229]
We argue that the development of artificial intelligence is entering a new phase of data-model co-evolution.<n>We introduce an L0-L4 tiered data management framework, ranging from raw uncurated resources to organized and verifiable knowledge.<n>We validate the effectiveness of the proposed framework through empirical studies.
arXiv Detail & Related papers (2026-02-09T18:47:51Z)
Data Scaling Laws for End-to-End Autonomous Driving [83.85463296830743]
We evaluate the performance of a simple end-to-end driving architecture on internal driving datasets ranging in size from 16 to 8192 hours.<n>Specifically, we investigate how much additional training data is needed to achieve a target performance gain.
arXiv Detail & Related papers (2025-04-06T03:23:48Z)
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors [50.131271229165165]
Federated Learning (FL) has emerged as a promising framework for distributed machine learning.<n>Data heterogeneity resulting from differences across user behaviors, preferences, and device characteristics poses a significant challenge for federated learning.<n>We propose Adaptive Weight Aggregation (FedAWA), a novel method that adaptively adjusts aggregation weights based on client vectors during the learning process.
arXiv Detail & Related papers (2025-03-20T04:49:40Z)
Integrating Personalized Federated Learning with Control Systems for Enhanced Performance [0.0]
This paper introduces a novel framework that amalgamates personalized federated learning with robust control systems.<n>Our approach harnesses personalized algorithms that adapt to the unique characteristics of each client's data.<n>We demonstrate that our integrated system outperforms standard federated learning models in terms of accuracy and learning speed.
arXiv Detail & Related papers (2025-01-27T01:52:15Z)
A Scalable Approach to Covariate and Concept Drift Management via Adaptive Data Segmentation [0.562479170374811]
In many real-world applications, continuous machine learning (ML) systems are crucial but prone to data drift. Traditional drift adaptation methods typically update models using ensemble techniques, often discarding drifted historical data. We contend that explicitly incorporating drifted data into the model training process significantly enhances model accuracy and robustness.
arXiv Detail & Related papers (2024-11-23T17:35:23Z)
Unlearning Information Bottleneck: Machine Unlearning of Systematic Patterns and Biases [6.936871609178494]
We present Unlearning Information Bottleneck (UIB), a novel information-theoretic framework designed to enhance the process of machine unlearning. By proposing a variational upper bound, we recalibrate the model parameters through a dynamic prior that integrates changes in data distribution with an affordable computational cost. Our experiments across various datasets, models, and unlearning methods demonstrate that our approach effectively removes systematic patterns and biases while maintaining the performance of models post-unlearning.
arXiv Detail & Related papers (2024-05-22T21:54:05Z)
Robust Machine Learning by Transforming and Augmenting Imperfect Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning. We first discuss how to prevent ML from codifying prior human discrimination measured in the training data. We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z)
Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities. In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed. This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z)
Adaptive Weighting Scheme for Automatic Time-Series Data Augmentation [79.47771259100674]
We present two sample-adaptive automatic weighting schemes for data augmentation. We validate our proposed methods on a large, noisy financial dataset and on time-series datasets from the UCR archive. On the financial dataset, we show that the methods in combination with a trading strategy lead to improvements in annualized returns of over 50$%$, and on the time-series data we outperform state-of-the-art models on over half of the datasets, and achieve similar performance in accuracy on the others.
arXiv Detail & Related papers (2021-02-16T17:50:51Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
Logarithmic Regret Bound in Partially Observable Linear Dynamical Systems [91.43582419264763]
We study the problem of system identification and adaptive control in partially observable linear dynamical systems. We present the first model estimation method with finite-time guarantees in both open and closed-loop system identification. We show that AdaptOn is the first algorithm that achieves $textpolylogleft(Tright)$ regret in adaptive control of unknown partially observable linear dynamical systems.
arXiv Detail & Related papers (2020-03-25T06:00:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.