Graph-Attentive MAPPO for Dynamic Retail Pricing
- URL: http://arxiv.org/abs/2511.00039v1
- Date: Tue, 28 Oct 2025 00:15:59 GMT
- Title: Graph-Attentive MAPPO for Dynamic Retail Pricing
- Authors: Krishna Kumar Neelakanta Pillai Santha Kumari Amma,
- Abstract summary: We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization.<n>We compare a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT)<n>Results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic pricing in retail requires policies that adapt to shifting demand while coordinating decisions across related products. We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization, comparing a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT) that leverages learned interactions among products. Using a simulated pricing environment derived from real transaction data, we evaluate profit, stability across random seeds, fairness across products, and training efficiency under a standardized evaluation protocol. The results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control, and that MAPPO+GAT further enhances performance by sharing information over the product graph without inducing excessive price volatility. These results indicate that graph-integrated MARL provides a more scalable and stable solution than independent learners for dynamic retail pricing, offering practical advantages in multi-product decision-making.
Related papers
- Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches [1.9116784879310027]
Existing works in financial sentiment analysis have not considered the impact of stock prices or market feedback on sentiment analysis.<n>We propose an adaptive framework that integrates large language models (LLMs) with real-world stock market feedback to improve sentiment classification.
arXiv Detail & Related papers (2025-12-23T06:27:12Z) - From Headlines to Holdings: Deep Learning for Smarter Portfolio Decisions [4.288926547930663]
We present an end-to-end framework that learns portfolio weights using deep learning.<n>We evaluate the framework on nine U.S. stocks spanning six sectors, chosen to balance sector diversity and news coverage.<n>Although the stock universe is limited, the results underscore the value of integrating price, relational, and sentiment signals for portfolio management.
arXiv Detail & Related papers (2025-09-29T00:42:24Z) - Multi-Agent Reinforcement Learning for Dynamic Pricing in Supply Chains: Benchmarking Strategic Agent Behaviours under Realistically Simulated Market Conditions [4.072683489517408]
This study investigates how Multi-Agent Reinforcement Learning (MARL) can improve dynamic pricing strategies in supply chains.<n>MARL introduces emergent strategic behaviour not captured by static pricing rules and may inform future developments in dynamic pricing.
arXiv Detail & Related papers (2025-07-03T15:07:37Z) - Transfer Learning for Nonparametric Contextual Dynamic Pricing [17.420508136662257]
Dynamic pricing strategies are crucial for firms to maximize revenue by adjusting prices based on market conditions and customer characteristics.<n>One promising approach to overcome this limitation is to leverage information from related products or markets to inform the focal pricing decisions.<n>We propose a novel Transfer Learning for Dynamic Pricing (TLDP) algorithm that can effectively leverage pre-collected data from a source domain to enhance pricing decisions in the target domain.
arXiv Detail & Related papers (2025-01-31T01:05:04Z) - Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization [75.1240295759264]
We propose an effective framework for Bridging and Modeling Correlations in pairwise data, named BMC.<n>We increase the consistency and informativeness of the pairwise preference signals through targeted modifications.<n>We identify that DPO alone is insufficient to model these correlations and capture nuanced variations.
arXiv Detail & Related papers (2024-08-14T11:29:47Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - A Bargaining-based Approach for Feature Trading in Vertical Federated
Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions.
Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - KnowGraph-PM: a Knowledge Graph based Pricing Model for Semiconductors
Supply Chains [0.0]
KnowGraph-PM is a knowledge graph-based dynamic pricing model.
Price change potentially generates conflicts with customers.
We demonstrate that semantic data integration enables customer-tailored revenue management.
arXiv Detail & Related papers (2022-05-13T10:34:57Z) - Model Distillation for Revenue Optimization: Interpretable Personalized
Pricing [8.07517029746865]
We present a customized, prescriptive tree-based algorithm that distills knowledge from a complex black-box machine learning algorithm.
It segments customers with similar valuations and prescribes prices in such a way that maximizes revenue while maintaining interpretability.
arXiv Detail & Related papers (2020-07-03T18:33:23Z) - Cost-Sensitive Portfolio Selection via Deep Reinforcement Learning [100.73223416589596]
We propose a cost-sensitive portfolio selection method with deep reinforcement learning.
Specifically, a novel two-stream portfolio policy network is devised to extract both price series patterns and asset correlations.
A new cost-sensitive reward function is developed to maximize the accumulated return and constrain both costs via reinforcement learning.
arXiv Detail & Related papers (2020-03-06T06:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.