Related papers: What Data Augmentation Do We Need for Deep-Learning-Based Finance?

What Data Augmentation Do We Need for Deep-Learning-Based Finance?

URL: http://arxiv.org/abs/2106.04114v1
Date: Tue, 8 Jun 2021 05:26:58 GMT
Title: What Data Augmentation Do We Need for Deep-Learning-Based Finance?
Authors: Liu Ziyin, Kentaro Minami, Kentaro Imajo
Abstract summary: We focus on developing a theoretical framework for understanding the use of data augmentation for deep-learning-based approaches to quantitative finance. The proposed theory clarifies the role and necessity of data augmentation for finance; moreover, our theory motivates a simple algorithm of injecting a random noise of strength. This algorithm is shown to work well in practice.
Score: 2.470815298095903
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The main task we consider is portfolio construction in a speculative market, a fundamental problem in modern finance. While various empirical works now exist to explore deep learning in finance, the theory side is almost non-existent. In this work, we focus on developing a theoretical framework for understanding the use of data augmentation for deep-learning-based approaches to quantitative finance. The proposed theory clarifies the role and necessity of data augmentation for finance; moreover, our theory motivates a simple algorithm of injecting a random noise of strength $\sqrt{|r_{t-1}|}$ to the observed return $r_{t}$. This algorithm is shown to work well in practice.

Related papers

Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance [35.617409883103335]
FinReason is the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks.<n>We introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets.<n>We develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL.
arXiv Detail & Related papers (2025-02-12T05:13:04Z)
Mathematics of Differential Machine Learning in Derivative Pricing and Hedging [0.0]
This article introduces the concept of the financial differential machine learning algorithm through a rigorous mathematical framework. The work highlights the profound implications of theoretical assumptions within financial models on the construction of machine learning algorithms.
arXiv Detail & Related papers (2024-05-02T12:25:41Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models [51.3422222472898]
We document the capability of large language models (LLMs) like ChatGPT to predict stock price movements using news headlines. We develop a theoretical model incorporating information capacity constraints, underreaction, limits-to-arbitrage, and LLMs.
arXiv Detail & Related papers (2023-04-15T19:22:37Z)
Factor Investing with a Deep Multi-Factor Model [123.52358449455231]
We develop a novel deep multi-factor model that adopts industry neutralization and market neutralization modules with clear financial insights. Tests on real-world stock market data demonstrate the effectiveness of our deep multi-factor model.
arXiv Detail & Related papers (2022-10-22T14:47:11Z)
Recent Advances in Reinforcement Learning in Finance [3.0079490585515343]
The rapid changes in the finance industry due to the increasing amount of data have revolutionized techniques on data processing and data analysis. New developments from reinforcement learning (RL) are able to make full use of the large amount of financial data.
arXiv Detail & Related papers (2021-12-08T19:55:26Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)
Algorithms for Learning Graphs in Financial Markets [5.735035463793008]
We investigate the fundamental problem of learning undirected graphical models under Laplacian structural constraints. We present natural justifications, supported by empirical evidence, for the usage of the Laplacian matrix as a model for the precision matrix of financial assets. We design numerical algorithms based on the alternating direction method of multipliers to learn undirected, weighted graphs.
arXiv Detail & Related papers (2020-12-31T02:48:35Z)
Deep Portfolio Optimization via Distributional Prediction of Residual Factors [3.9189409002585562]
We propose a novel method of constructing a portfolio based on predicting the distribution of a financial quantity called residual factors. We demonstrate the efficacy of our method on U.S. and Japanese stock market data.
arXiv Detail & Related papers (2020-12-14T04:09:52Z)
The Information Bottleneck Problem and Its Applications in Machine Learning [53.57797720793437]
Inference capabilities of machine learning systems skyrocketed in recent years, now playing a pivotal role in various aspect of society. The information bottleneck (IB) theory emerged as a bold information-theoretic paradigm for analyzing deep learning (DL) systems. In this tutorial we survey the information-theoretic origins of this abstract principle, and its recent impact on DL.
arXiv Detail & Related papers (2020-04-30T16:48:51Z)
Budget Learning via Bracketing [50.085728094234476]
The budget learning problem poses the learner's goal as minimising use of the cloud while suffering no discernible loss in accuracy. We propose a new formulation for the BL problem via the concept of bracketings. We empirically validate our theory on real-world datasets, demonstrating improved performance over prior gating based methods.
arXiv Detail & Related papers (2020-04-14T04:38:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.