Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions
- URL: http://arxiv.org/abs/2507.02698v1
- Date: Thu, 03 Jul 2025 15:07:37 GMT
- Title: Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions
- Authors: Thomas Hazenberg, Yao Ma, Seyed Sahand Mohammadi Ziabari, Marijn van Rijswijk,
- Abstract summary: This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains.<n>MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.
- Score: 4.072683489517408
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains, particularly in contexts where traditional ERP systems rely on static, rule-based approaches that overlook strategic interactions among market actors. While recent research has applied reinforcement learning to pricing, most implementations remain single-agent and fail to model the interdependent nature of real-world supply chains. This study addresses that gap by evaluating the performance of three MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines, within a simulated environment informed by real e-commerce transaction data and a LightGBM demand prediction model. Results show that rule-based agents achieve near-perfect fairness (Jain's Index: 0.9896) and the highest price stability (volatility: 0.024), but they fully lack competitive dynamics. Among MARL agents, MADQN exhibits the most aggressive pricing behaviour, with the highest volatility and the lowest fairness (0.5844). MADDPG provides a more balanced approach, supporting market competition (share volatility: 9.5 pp) while maintaining relatively high fairness (0.8819) and stable pricing. These findings suggest that MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.
Related papers
- Ratio-Variance Regularized Policy Optimization for Efficient LLM Fine-tuning [48.34492357368989]
We propose a primal-dual framework that supports stable on-policy learning and enables principled off-policy data reuse.<n>$R2VPO$ achieves superior performance with average relative gains of up to 17% over strong clipping-based baselines.
arXiv Detail & Related papers (2026-01-06T14:01:42Z) - How Market Volatility Shapes Algorithmic Collusion: A Comparative Analysis of Learning-Based Pricing Algorithms [1.3716158732399093]
This paper offers a thorough analysis of four pricing algorithms across three classic duopoly models (Logit, Hotelling, Linear) and under various demand-shock regimes created by auto-regressive processes.<n>Our findings reveal that reinforcement-learning algorithms often sustain supra-competitive prices under stable demand.<n>Despite marked changes in absolute performance, the relative rankings of the algorithms are consistent across different environments.
arXiv Detail & Related papers (2025-12-01T19:01:22Z) - Graph-Attentive MAPPO for Dynamic Retail Pricing [0.0]
We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization.<n>We compare a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT)<n>Results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control.
arXiv Detail & Related papers (2025-10-28T00:15:59Z) - Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets [57.179679246370114]
In financial applications, reinforcement learning (RL) agents are commonly trained on historical data, where their actions do not influence prices.<n>During deployment, these agents trade in live markets where their own transactions can shift asset prices, a phenomenon known as market impact.<n>Traditional robust RL approaches address this model misspecification by optimizing the worst-case performance over a set of uncertainties.<n>We develop a novel class of elliptic uncertainty sets, enabling efficient and tractable robust policy evaluation.
arXiv Detail & Related papers (2025-10-22T18:22:25Z) - Parallel and Multi-Stage Knowledge Graph Retrieval for Behaviorally Aligned Financial Asset Recommendations [46.90931293070464]
This paper introduces RAG-FLARKO, a retrieval-augmented extension to FLARKO.<n>It overcomes scalability and relevance challenges using multi-stage and parallel KG retrieval processes.<n> Empirical evaluation on a real-world financial transaction dataset demonstrates that RAG-FLARKO significantly enhances recommendation quality.
arXiv Detail & Related papers (2025-10-08T20:42:53Z) - Benchmarking Robust Aggregation in Decentralized Gradient Marketplaces [12.367831558441994]
We introduce a benchmark framework to holistically evaluate robust gradient aggregation methods within buyer-baseline-reliant marketplaces.<n>Our contributions include: (1) a simulation environment modeling marketplace dynamics with a variable buyer baseline and diverse seller distributions; (2) an evaluation methodology augmenting standard FL metrics with marketplace-centric dimensions such as Economic Efficiency, Fairness, and Selection Dynamics; and (3) an in-depth empirical analysis of the existing Distributed Gradient Marketplace framework, MartFL.
arXiv Detail & Related papers (2025-09-06T21:06:50Z) - VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets [3.498661956610689]
This paper introduces a model for coordinating prosumers with heterogeneous distributed energy resources (DERs) in a local energy market (LEM)<n>The proposed LEM scheme utilizes a data-driven, model-free reinforcement learning approach based on the multi-agent deep deterministic policy gradient (MADDPG)<n>We investigate a price manipulation strategy using a variational auto encoder-generative adversarial network (VAE-GAN) model, which allows utilities to adjust price signals in a way that induces financial losses for the prosumers.
arXiv Detail & Related papers (2025-07-26T07:38:27Z) - Dynamic Reinsurance Treaty Bidding via Multi-Agent Reinforcement Learning [0.0]
This paper develops a novel multi-agent reinforcement learning (MARL) framework for reinsurance treaty bidding.<n>MARL agents achieve up to 15% higher underwriting profit, 20% lower tail risk, and over 25% improvement in Sharpe ratios.<n>These findings suggest that MARL offers a viable path toward more transparent, adaptive, and risk-sensitive reinsurance markets.
arXiv Detail & Related papers (2025-06-16T05:43:22Z) - Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z) - Agent Trading Arena: A Study on Numerical Understanding in LLM-Based Agents [69.58565132975504]
Large language models (LLMs) have demonstrated remarkable capabilities in natural language tasks.<n>We present the Agent Trading Arena, a virtual zero-sum stock market in which LLM-based agents engage in competitive multi-agent trading.
arXiv Detail & Related papers (2025-02-25T08:41:01Z) - Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation [0.0]
We generate high-fidelity synthetic bond yield data for four major bond categories (AAA, BAA, US10Y,)<n>We employ a finetuned Large Language Model (LLM) Qwen2.5-7B that generates trading signals, risk assessments, and volatility projections.<n>The reinforcement learning-enhanced synthetic data generation achieves the least Mean Absolute Error of 0.103, demonstrating its effectiveness in replicating real-world bond market dynamics.
arXiv Detail & Related papers (2025-02-24T09:46:37Z) - MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We propose an immersion-aware model trading framework that facilitates data provision for services while ensuring privacy through federated learning (FL)<n>We design an incentive mechanism to incentivize metaverse users (MUs) to contribute high-value models under resource constraints.<n>We develop a fully distributed dynamic reward algorithm based on deep reinforcement learning, without accessing any private information about MUs and other MSPs.
arXiv Detail & Related papers (2024-10-25T16:20:46Z) - Cross-border Commodity Pricing Strategy Optimization via Mixed Neural Network for Time Series Analysis [46.26988706979189]
Cross-border commodity pricing determines competitiveness and market share of businesses.
Time series data is of great significance in commodity pricing and can reveal market dynamics and trends.
We propose a new method based on the hybrid neural network model CNN-BiGRU-SSA.
arXiv Detail & Related papers (2024-08-22T03:59:52Z) - INTAGS: Interactive Agent-Guided Simulation [4.04638613278729]
In many applications involving multi-agent system (MAS), it is imperative to test an experimental (Exp) autonomous agent in a high-fidelity simulator prior to its deployment to production.
We propose a metric to distinguish between real and synthetic multi-agent systems, which is evaluated through the live interaction between the Exp and BG agents.
We show that using INTAGS to calibrate the simulator can generate more realistic market data compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network approach.
arXiv Detail & Related papers (2023-09-04T19:56:18Z) - Insurance pricing on price comparison websites via reinforcement
learning [7.023335262537794]
This paper introduces reinforcement learning framework that learns optimal pricing policy by integrating model-based and model-free methods.
The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion.
arXiv Detail & Related papers (2023-08-14T04:44:56Z) - Deep Policy Gradient Methods in Commodity Markets [0.0]
Traders play an important role in stabilizing markets by providing liquidity and reducing volatility.
This thesis investigates the effectiveness of deep reinforcement learning methods in commodities trading.
arXiv Detail & Related papers (2023-06-14T11:50:23Z) - Joint Latent Topic Discovery and Expectation Modeling for Financial
Markets [45.758436505779386]
We present a groundbreaking framework for financial market analysis.
This approach is the first to jointly model investor expectations and automatically mine latent stock relationships.
Our model consistently achieves an annual return exceeding 10%.
arXiv Detail & Related papers (2023-06-01T01:36:51Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics
in Limit-Order Book Markets [84.90242084523565]
Traditional time-series econometric methods often appear incapable of capturing the true complexity of the multi-level interactions driving the price dynamics.
By adopting a state-of-the-art second-order optimization algorithm, we train a Bayesian bilinear neural network with temporal attention.
By addressing the use of predictive distributions to analyze errors and uncertainties associated with the estimated parameters and model forecasts, we thoroughly compare our Bayesian model with traditional ML alternatives.
arXiv Detail & Related papers (2022-03-07T18:59:54Z) - Multi-Asset Spot and Option Market Simulation [52.77024349608834]
We construct realistic spot and equity option market simulators for a single underlying on the basis of normalizing flows.
We leverage the conditional invertibility property of normalizing flows and introduce a scalable method to calibrate the joint distribution of a set of independent simulators.
arXiv Detail & Related papers (2021-12-13T17:34:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.