QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE
- URL: http://arxiv.org/abs/2409.05144v2
- Date: Tue, 8 Oct 2024 15:37:54 GMT
- Title: QuantFactor REINFORCE: Mining Steady Formulaic Alpha Factors with Variance-bounded REINFORCE
- Authors: Junjie Zhao, Chengxi Zhang, Min Qin, Peng Yang,
- Abstract summary: The goal of alpha factor mining is to discover indicative signals of investment opportunities from the historical financial market data of assets.
Recently, a promising framework is proposed for generating formulaic alpha factors using deep reinforcement learning.
- Score: 5.560011325936085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of alpha factor mining is to discover indicative signals of investment opportunities from the historical financial market data of assets, which can be used to predict asset returns and gain excess profits. Recently, a promising framework is proposed for generating formulaic alpha factors using deep reinforcement learning, and quickly gained research focuses from both academia and industries. This paper first argues that the originally employed policy training method, i.e., Proximal Policy Optimization (PPO), faces several important issues in the context of alpha factors mining, making it ineffective to explore the search space of the formula. Herein, a novel reinforcement learning based on the well-known REINFORCE algorithm is proposed. Given that the underlying state transition function adheres to the Dirac distribution, the Markov Decision Process within this framework exhibit minimal environmental variability, making REINFORCE algorithm more appropriate than PPO. A new dedicated baseline is designed to theoretically reduce the commonly suffered high variance of REINFORCE. Moreover, the information ratio is introduced as a reward shaping mechanism to encourage the generation of steady alpha factors that can better adapt to changes in market volatility. Experimental evaluations on various real assets data show that the proposed algorithm can increase the correlation with asset returns by 3.83\%, and a stronger ability to obtain excess returns compared to the latest alpha factors mining methods, which meets the theoretical results well.
Related papers
- Learning from Expert Factors: Trajectory-level Reward Shaping for Formulaic Alpha Mining [5.560011325936085]
Reinforcement learning has successfully automated the complex process of mining formulaic alpha factors, for creating interpretable and profitable investment strategies.<n>Existing methods are hampered by the sparse rewards given the underlying Markov Decision Process.<n>To address this, Trajectory-level Rewarding (TLRS), a novel reward shaping method, is proposed.
arXiv Detail & Related papers (2025-07-27T13:14:48Z) - On-Policy RL with Optimal Reward Baseline [109.47676554514193]
On-Policy RL with Optimal reward baseline (OPO) is a novel and simplified reinforcement learning algorithm.<n>OPO emphasizes the importance of exact on-policy training, which empirically stabilizes the training process and enhances exploration.<n>Results demonstrate OPO's superior performance and training stability without additional models or regularization terms.
arXiv Detail & Related papers (2025-05-29T15:58:04Z) - Navigating the Alpha Jungle: An LLM-Powered MCTS Framework for Formulaic Factor Mining [8.53606484300001]
This paper introduces a novel framework that integrates Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS)<n>A key innovation is the guidance of MCTS exploration by rich, quantitative feedback from financial backtesting of each candidate factor.<n> Experimental results on real-world stock market data demonstrate that our LLM-based framework outperforms existing methods by mining alphas with superior predictive accuracy and trading performance.
arXiv Detail & Related papers (2025-05-16T11:14:17Z) - A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce [68.99924691391048]
We revisit GRPO from a reinforce-like algorithm perspective and analyze its core components.
We find that a simple rejection sampling baseline, RAFT, yields competitive performance than GRPO and PPO.
Motivated by this insight, we propose Reinforce-Rej, a minimal extension of policy gradient that filters both entirely incorrect and entirely correct samples.
arXiv Detail & Related papers (2025-04-15T16:15:02Z) - Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model [84.00480999255628]
Reinforcement Learning algorithms for safety alignment of Large Language Models (LLMs) encounter the challenge of distribution shift.
Current approaches typically address this issue through online sampling from the target policy.
We propose a new framework that leverages the model's intrinsic safety judgment capability to extract reward signals.
arXiv Detail & Related papers (2025-03-13T06:40:34Z) - Alpha Mining and Enhancing via Warm Start Genetic Programming for Quantitative Investment [3.4196842063159076]
Traditional genetic programming (GP) often struggles in stock alpha factor discovery.
We find that GP performs better when focusing on promising regions rather than random searching.
arXiv Detail & Related papers (2024-12-01T17:13:54Z) - Robot See, Robot Do: Imitation Reward for Noisy Financial Environments [0.0]
This paper introduces a novel and more robust reward function by leveraging imitation learning.
We integrate imitation (expert's) feedback with reinforcement (agent's) feedback in a model-free reinforcement learning algorithm.
Empirical results demonstrate that this novel approach improves financial performance metrics compared to traditional benchmarks.
arXiv Detail & Related papers (2024-11-13T14:24:47Z) - AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors [14.80394452270726]
This paper proposes a two-stage alpha generating framework AlphaForge, for alpha factor mining and factor combination.
Experiments conducted on real-world datasets demonstrate that our proposed model outperforms contemporary benchmarks in formulaic alpha factor mining.
arXiv Detail & Related papers (2024-06-26T14:34:37Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Provable Risk-Sensitive Distributional Reinforcement Learning with
General Function Approximation [54.61816424792866]
We introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation.
We design two innovative meta-algorithms: textttRS-DisRL-M, a model-based strategy for model-based function approximation, and textttRS-DisRL-V, a model-free approach for general value function approximation.
arXiv Detail & Related papers (2024-02-28T08:43:18Z) - Synergistic Formulaic Alpha Generation for Quantitative Trading based on Reinforcement Learning [1.3194391758295114]
This paper proposes a method to enhance existing alpha factor mining approaches by expanding a search space.
We employ information coefficient (IC) and rank information coefficient (Rank IC) as performance evaluation metrics for the model.
arXiv Detail & Related papers (2024-01-05T08:49:13Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - Provably Efficient Iterated CVaR Reinforcement Learning with Function
Approximation and Human Feedback [57.6775169085215]
Risk-sensitive reinforcement learning aims to optimize policies that balance the expected reward and risk.
We present a novel framework that employs an Iterated Conditional Value-at-Risk (CVaR) objective under both linear and general function approximations.
We propose provably sample-efficient algorithms for this Iterated CVaR RL and provide rigorous theoretical analysis.
arXiv Detail & Related papers (2023-07-06T08:14:54Z) - Factor Investing with a Deep Multi-Factor Model [123.52358449455231]
We develop a novel deep multi-factor model that adopts industry neutralization and market neutralization modules with clear financial insights.
Tests on real-world stock market data demonstrate the effectiveness of our deep multi-factor model.
arXiv Detail & Related papers (2022-10-22T14:47:11Z) - An intelligent algorithmic trading based on a risk-return reinforcement
learning algorithm [0.0]
This scientific paper propose a novel portfolio optimization model using an improved deep reinforcement learning algorithm.
The proposed algorithm is based on actor-critic architecture, in which the main task of critical network is to learn the distribution of portfolio cumulative return.
A multi-process method is used, called Ape-x, to accelerate the speed of deep reinforcement learning training.
arXiv Detail & Related papers (2022-08-23T03:20:06Z) - Fuzzy Expert System for Stock Portfolio Selection: An Application to
Bombay Stock Exchange [0.0]
Fuzzy expert system model is proposed to evaluate and rank the stocks under Bombay Stock Exchange (BSE)
The performance of the model proved to be satisfactory for short-term investment period when compared with the recent performance of the stocks.
arXiv Detail & Related papers (2022-04-28T10:01:15Z) - Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk.
Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z) - Risk Minimization from Adaptively Collected Data: Guarantees for
Supervised and Policy Learning [57.88785630755165]
Empirical risk minimization (ERM) is the workhorse of machine learning, but its model-agnostic guarantees can fail when we use adaptively collected data.
We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class.
For policy learning, we provide rate-optimal regret guarantees that close an open gap in the existing literature whenever exploration decays to zero.
arXiv Detail & Related papers (2021-06-03T09:50:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.