Related papers: Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions

URL: http://arxiv.org/abs/2507.02698v1
Date: Thu, 03 Jul 2025 15:07:37 GMT
Title: Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions
Authors: Thomas Hazenberg, Yao Ma, Seyed Sahand Mohammadi Ziabari, Marijn van Rijswijk,
Abstract summary: This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains.<n>MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.
Score: 4.072683489517408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains, particularly in contexts where traditional ERP systems rely on static, rule-based approaches that overlook strategic interactions among market actors. While recent research has applied reinforcement learning to pricing, most implementations remain single-agent and fail to model the interdependent nature of real-world supply chains. This study addresses that gap by evaluating the performance of three MARL algorithms: MADDPG, MADQN, and QMIX against static rule-based baselines, within a simulated environment informed by real e-commerce transaction data and a LightGBM demand prediction model. Results show that rule-based agents achieve near-perfect fairness (Jain's Index: 0.9896) and the highest price stability (volatility: 0.024), but they fully lack competitive dynamics. Among MARL agents, MADQN exhibits the most aggressive pricing behaviour, with the highest volatility and the lowest fairness (0.5844). MADDPG provides a more balanced approach, supporting market competition (share volatility: 9.5 pp) while maintaining relatively high fairness (0.8819) and stable pricing. These findings suggest that MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.

Related papers

VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets [3.498661956610689]
This paper introduces a model for coordinating prosumers with heterogeneous distributed energy resources (DERs) in a local energy market (LEM)<n>The proposed LEM scheme utilizes a data-driven, model-free reinforcement learning approach based on the multi-agent deep deterministic policy gradient (MADDPG)<n>We investigate a price manipulation strategy using a variational auto encoder-generative adversarial network (VAE-GAN) model, which allows utilities to adjust price signals in a way that induces financial losses for the prosumers.
arXiv Detail & Related papers (2025-07-26T07:38:27Z)
Dynamic Reinsurance Treaty Bidding via Multi-Agent Reinforcement Learning [0.0]
This paper develops a novel multi-agent reinforcement learning (MARL) framework for reinsurance treaty bidding.<n>MARL agents achieve up to 15% higher underwriting profit, 20% lower tail risk, and over 25% improvement in Sharpe ratios.<n>These findings suggest that MARL offers a viable path toward more transparent, adaptive, and risk-sensitive reinsurance markets.
arXiv Detail & Related papers (2025-06-16T05:43:22Z)
Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z)
Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation [0.0]
We generate high-fidelity synthetic bond yield data for four major bond categories (AAA, BAA, US10Y,)<n>We employ a finetuned Large Language Model (LLM) Qwen2.5-7B that generates trading signals, risk assessments, and volatility projections.<n>The reinforcement learning-enhanced synthetic data generation achieves the least Mean Absolute Error of 0.103, demonstrating its effectiveness in replicating real-world bond market dynamics.
arXiv Detail & Related papers (2025-02-24T09:46:37Z)
MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We propose an immersion-aware model trading framework that facilitates data provision for services while ensuring privacy through federated learning (FL)<n>We design an incentive mechanism to incentivize metaverse users (MUs) to contribute high-value models under resource constraints.<n>We develop a fully distributed dynamic reward algorithm based on deep reinforcement learning, without accessing any private information about MUs and other MSPs.
arXiv Detail & Related papers (2024-10-25T16:20:46Z)
Cross-border Commodity Pricing Strategy Optimization via Mixed Neural Network for Time Series Analysis [46.26988706979189]
Cross-border commodity pricing determines competitiveness and market share of businesses. Time series data is of great significance in commodity pricing and can reveal market dynamics and trends. We propose a new method based on the hybrid neural network model CNN-BiGRU-SSA.
arXiv Detail & Related papers (2024-08-22T03:59:52Z)
INTAGS: Interactive Agent-Guided Simulation [4.04638613278729]
In many applications involving multi-agent system (MAS), it is imperative to test an experimental (Exp) autonomous agent in a high-fidelity simulator prior to its deployment to production. We propose a metric to distinguish between real and synthetic multi-agent systems, which is evaluated through the live interaction between the Exp and BG agents. We show that using INTAGS to calibrate the simulator can generate more realistic market data compared to the state-of-the-art conditional Wasserstein Generative Adversarial Network approach.
arXiv Detail & Related papers (2023-09-04T19:56:18Z)
Insurance pricing on price comparison websites via reinforcement learning [7.023335262537794]
This paper introduces reinforcement learning framework that learns optimal pricing policy by integrating model-based and model-free methods. The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion.
arXiv Detail & Related papers (2023-08-14T04:44:56Z)
Deep Policy Gradient Methods in Commodity Markets [0.0]
Traders play an important role in stabilizing markets by providing liquidity and reducing volatility. This thesis investigates the effectiveness of deep reinforcement learning methods in commodities trading.
arXiv Detail & Related papers (2023-06-14T11:50:23Z)
Joint Latent Topic Discovery and Expectation Modeling for Financial Markets [45.758436505779386]
We present a groundbreaking framework for financial market analysis. This approach is the first to jointly model investor expectations and automatically mine latent stock relationships. Our model consistently achieves an annual return exceeding 10%.
arXiv Detail & Related papers (2023-06-01T01:36:51Z)
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time. We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z)
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment. We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z)
Bayesian Bilinear Neural Network for Predicting the Mid-price Dynamics in Limit-Order Book Markets [84.90242084523565]
Traditional time-series econometric methods often appear incapable of capturing the true complexity of the multi-level interactions driving the price dynamics. By adopting a state-of-the-art second-order optimization algorithm, we train a Bayesian bilinear neural network with temporal attention. By addressing the use of predictive distributions to analyze errors and uncertainties associated with the estimated parameters and model forecasts, we thoroughly compare our Bayesian model with traditional ML alternatives.
arXiv Detail & Related papers (2022-03-07T18:59:54Z)
Multi-Asset Spot and Option Market Simulation [52.77024349608834]
We construct realistic spot and equity option market simulators for a single underlying on the basis of normalizing flows. We leverage the conditional invertibility property of normalizing flows and introduce a scalable method to calibrate the joint distribution of a set of independent simulators.
arXiv Detail & Related papers (2021-12-13T17:34:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.