What Data Augmentation Do We Need for Deep-Learning-Based Finance?
- URL: http://arxiv.org/abs/2106.04114v1
- Date: Tue, 8 Jun 2021 05:26:58 GMT
- Title: What Data Augmentation Do We Need for Deep-Learning-Based Finance?
- Authors: Liu Ziyin, Kentaro Minami, Kentaro Imajo
- Abstract summary: We focus on developing a theoretical framework for understanding the use of data augmentation for deep-learning-based approaches to quantitative finance.
The proposed theory clarifies the role and necessity of data augmentation for finance; moreover, our theory motivates a simple algorithm of injecting a random noise of strength.
This algorithm is shown to work well in practice.
- Score: 2.470815298095903
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The main task we consider is portfolio construction in a speculative market,
a fundamental problem in modern finance. While various empirical works now
exist to explore deep learning in finance, the theory side is almost
non-existent. In this work, we focus on developing a theoretical framework for
understanding the use of data augmentation for deep-learning-based approaches
to quantitative finance. The proposed theory clarifies the role and necessity
of data augmentation for finance; moreover, our theory motivates a simple
algorithm of injecting a random noise of strength $\sqrt{|r_{t-1}|}$ to the
observed return $r_{t}$. This algorithm is shown to work well in practice.
Related papers
- Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance [35.617409883103335]
FinReason is the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks.<n>We introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets.<n>We develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL.
arXiv Detail & Related papers (2025-02-12T05:13:04Z) - Mathematics of Differential Machine Learning in Derivative Pricing and Hedging [0.0]
This article introduces the concept of the financial differential machine learning algorithm through a rigorous mathematical framework.
The work highlights the profound implications of theoretical assumptions within financial models on the construction of machine learning algorithms.
arXiv Detail & Related papers (2024-05-02T12:25:41Z) - AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data.
We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z) - Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines.
We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z) - Factor Investing with a Deep Multi-Factor Model [123.52358449455231]
We develop a novel deep multi-factor model that adopts industry neutralization and market neutralization modules with clear financial insights.
Tests on real-world stock market data demonstrate the effectiveness of our deep multi-factor model.
arXiv Detail & Related papers (2022-10-22T14:47:11Z) - Recent Advances in Reinforcement Learning in Finance [3.0079490585515343]
The rapid changes in the finance industry due to the increasing amount of data have revolutionized techniques on data processing and data analysis.
New developments from reinforcement learning (RL) are able to make full use of the large amount of financial data.
arXiv Detail & Related papers (2021-12-08T19:55:26Z) - FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents.
We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts.
The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z) - Algorithms for Learning Graphs in Financial Markets [5.735035463793008]
We investigate the fundamental problem of learning undirected graphical models under Laplacian structural constraints.
We present natural justifications, supported by empirical evidence, for the usage of the Laplacian matrix as a model for the precision matrix of financial assets.
We design numerical algorithms based on the alternating direction method of multipliers to learn undirected, weighted graphs.
arXiv Detail & Related papers (2020-12-31T02:48:35Z) - Deep Portfolio Optimization via Distributional Prediction of Residual
Factors [3.9189409002585562]
We propose a novel method of constructing a portfolio based on predicting the distribution of a financial quantity called residual factors.
We demonstrate the efficacy of our method on U.S. and Japanese stock market data.
arXiv Detail & Related papers (2020-12-14T04:09:52Z) - The Information Bottleneck Problem and Its Applications in Machine
Learning [53.57797720793437]
Inference capabilities of machine learning systems skyrocketed in recent years, now playing a pivotal role in various aspect of society.
The information bottleneck (IB) theory emerged as a bold information-theoretic paradigm for analyzing deep learning (DL) systems.
In this tutorial we survey the information-theoretic origins of this abstract principle, and its recent impact on DL.
arXiv Detail & Related papers (2020-04-30T16:48:51Z) - Budget Learning via Bracketing [50.085728094234476]
The budget learning problem poses the learner's goal as minimising use of the cloud while suffering no discernible loss in accuracy.
We propose a new formulation for the BL problem via the concept of bracketings.
We empirically validate our theory on real-world datasets, demonstrating improved performance over prior gating based methods.
arXiv Detail & Related papers (2020-04-14T04:38:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.