Related papers: Optimal Execution with Reinforcement Learning

Related papers

Generative Actor Critic [74.04971271003869]
Generative Actor Critic (GAC) is a novel framework that decouples sequential decision-making by reframing textitpolicy evaluation as learning a generative model of the joint distribution over trajectories and returns.<n>Experiments on Gym-MuJoCo and Maze2D benchmarks demonstrate GAC's strong offline performance and significantly enhanced offline-to-online improvement compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-12-25T06:31:11Z)
Reinforcement Learning in Queue-Reactive Models: Application to Optimal Execution [0.35932002706017546]
We investigate the use of Reinforcement Learning for the optimal execution of meta-orders.<n>The objective is to execute incrementally large orders while minimizing implementation shortfall and market impact.<n>We employ the Queue-Reactive Model to generate realistic and tractable limit order book simulations.
arXiv Detail & Related papers (2025-11-19T09:26:23Z)
Continuous-Time Reinforcement Learning for Asset-Liability Management [0.0]
This paper proposes a novel approach for Asset-Liability Management (ALM) by employing continuous-time Reinforcement Learning (RL)<n>We develop a model-free, policy gradient-based soft actor-critic algorithm tailored to ALM for dynamically synchronizing assets and liabilities.<n>Our empirical study evaluates this approach against two enhanced traditional financial strategies, a model-based continuous-time RL method, and three state-of-the-art RL algorithms.
arXiv Detail & Related papers (2025-09-27T12:36:51Z)
Learning Dynamic Representations via An Optimally-Weighted Maximum Mean Discrepancy Optimization Framework for Continual Learning [16.10753846850319]
Continual learning allows models to persistently acquire and retain information. catastrophic forgetting can severely impair model performance. We introduce a novel framework termed Optimally-Weighted Mean Discrepancy (OWMMD), which imposes penalties on representation alterations.
arXiv Detail & Related papers (2025-01-21T13:33:45Z)
NEAT Algorithm-based Stock Trading Strategy with Multiple Technical Indicators Resonance [0.8158530638728501]
We applied the NEAT (NeuroEvolution of Augmenting Topologies) algorithm to stock trading using multiple technical indicators. Our approach focused on maximizing earning, avoiding risk, and outperforming the Buy & Hold strategy. The results of our study showed that the NEAT model achieved similar returns to the Buy & Hold strategy, but with lower risk exposure and greater stability.
arXiv Detail & Related papers (2024-12-11T05:42:15Z)
Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study [10.404992912881601]
We study continuous-time mean--variance portfolio selection in markets where stock prices are diffusion processes driven by observable factors that are also diffusion processes.<n>We present a general data-driven RL algorithm that learns the pre-committed investment strategy directly without attempting to learn or estimate the market coefficients.<n>The proposed continuous-time RL strategy is consistently among the best, especially in a volatile bear market, and decisively outperforms the model-based continuous-time counterparts by significant margins.
arXiv Detail & Related papers (2024-12-08T15:31:10Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Deep Reinforcement Learning for Online Optimal Execution Strategies [49.1574468325115]
This paper tackles the challenge of learning non-Markovian optimal execution strategies in dynamic financial markets. We introduce a novel actor-critic algorithm based on Deep Deterministic Policy Gradient (DDPG) We show that our algorithm successfully approximates the optimal execution strategy.
arXiv Detail & Related papers (2024-10-17T12:38:08Z)
EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty. We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications. Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z)
Model-Free Active Exploration in Reinforcement Learning [53.786439742572995]
We study the problem of exploration in Reinforcement Learning and present a novel model-free solution. Our strategy is able to identify efficient policies faster than state-of-the-art exploration approaches.
arXiv Detail & Related papers (2024-06-30T19:00:49Z)
Deep Limit Order Book Forecasting [2.771933807499954]
We exploit cutting-edge deep learning methodologies to explore predictability of high-frequency Limit Order Book mid-price changes. We release LOBFrame', an open-source code base to efficiently process large-scale Limit Order Book data.
arXiv Detail & Related papers (2024-03-14T10:44:10Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z)
LOB-Based Deep Learning Models for Stock Price Trend Prediction: A Benchmark Study [4.714825039388054]
We develop an open-source framework that incorporates data preprocessing, DL model training, evaluation and profit analysis. Our experiments reveal that all models exhibit a significant performance drop when exposed to new data, thereby raising questions about their real-world market applicability.
arXiv Detail & Related papers (2023-07-05T14:28:38Z)
Optimizing Credit Limit Adjustments Under Adversarial Goals Using Reinforcement Learning [42.303733194571905]
We seek to find and automatize an optimal credit card limit adjustment policy by employing reinforcement learning techniques. Our research establishes a conceptual structure for applying reinforcement learning framework to credit limit adjustment.
arXiv Detail & Related papers (2023-06-27T16:10:36Z)
When Demonstrations Meet Generative World Models: A Maximum Likelihood Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent. Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z)
Asynchronous Deep Double Duelling Q-Learning for Trading-Signal Execution in Limit Order Book Markets [5.202524136984542]
We employ deep reinforcement learning to train an agent to translate a high-frequency trading signal into a trading strategy that places individual limit orders. Based on the ABIDES limit order book simulator, we build a reinforcement learning OpenAI gym environment. We find that the RL agent learns an effective trading strategy for inventory management and order placing that outperforms a benchmark trading strategy having access to the same signal.
arXiv Detail & Related papers (2023-01-20T17:19:18Z)
Deep Inventory Management [3.578617477295742]
We present a Deep Reinforcement Learning approach to solving a periodic review inventory control system. We show that several policy learning approaches are competitive with or outperform classical baseline approaches.
arXiv Detail & Related papers (2022-10-06T18:00:25Z)
Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using Reinforcement Learning [0.0]
Reinforcement learning (RL) is an effective technique for training decision-making agents through interactions with their environment. In digital advertising, real-time bidding (RTB) is a common method of allocating advertising inventory through real-time auctions.
arXiv Detail & Related papers (2021-05-21T21:56:12Z)
Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z)
Time your hedge with Deep Reinforcement Learning [0.0]
Deep Reinforcement Learning (DRL) can tackle this challenge by creating a dynamic dependency between market information and hedging strategies allocation decisions. We present a realistic and augmented DRL framework that: (i) uses additional contextual information to decide an action, (ii) has a one period lag between observations and actions to account for one day lag turnover of common asset managers to rebalance their hedge, (iii) is fully tested in terms of stability and robustness thanks to a repetitive train test method called anchored walk forward training, similar in spirit to k fold cross validation for time series and (iv) allows managing leverage of our hedging
arXiv Detail & Related papers (2020-09-16T06:43:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.