Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales
- URL: http://arxiv.org/abs/2510.04272v1
- Date: Sun, 05 Oct 2025 16:28:06 GMT
- Title: Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales
- Authors: Jinyang Jiang, Jinhui Han, Yijie Peng, Ying Zhang,
- Abstract summary: This paper proposes a unified multi-agent RL framework tailored for joint optimization across distinct functional modules.<n>We first develop an integrated theoretical model to capture the interplay between these functions.<n>We then design a novel multi-timescale multi-agent RL architecture that decomposes policy components according to departmental functions.
- Score: 6.5434300333872875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective cross-functional coordination is essential for enhancing firm-wide profitability, particularly in the face of growing organizational complexity and scale. Recent advances in artificial intelligence, especially in reinforcement learning (RL), offer promising avenues to address this fundamental challenge. This paper proposes a unified multi-agent RL framework tailored for joint optimization across distinct functional modules, exemplified via coordinating inventory replenishment and personalized product recommendation. We first develop an integrated theoretical model to capture the intricate interplay between these functions and derive analytical benchmarks that characterize optimal coordination. The analysis reveals synchronized adjustment patterns across products and over time, highlighting the importance of coordinated decision-making. Leveraging these insights, we design a novel multi-timescale multi-agent RL architecture that decomposes policy components according to departmental functions and assigns distinct learning speeds based on task complexity and responsiveness. Our model-free multi-agent design improves scalability and deployment flexibility, while multi-timescale updates enhance convergence stability and adaptability across heterogeneous decisions. We further establish the asymptotic convergence of the proposed algorithm. Extensive simulation experiments demonstrate that the proposed approach significantly improves profitability relative to siloed decision-making frameworks, while the behaviors of the trained RL agents align closely with the managerial insights from our theoretical model. Taken together, this work provides a scalable, interpretable RL-based solution to enable effective cross-functional coordination in complex business settings.
Related papers
- Deep Unfolding: Recent Developments, Theory, and Design Guidelines [99.63555420898554]
This article provides a tutorial-style overview of deep unfolding, a framework that transforms optimization algorithms into structured, trainable ML architectures.<n>We review the foundations of optimization for inference and for learning, introduce four representative design paradigms for deep unfolding, and discuss the distinctive training schemes that arise from their iterative nature.
arXiv Detail & Related papers (2025-12-03T13:16:35Z) - An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones [12.385878815004283]
This study proposes a hierarchical Actor-Critic deep reinforcement learning framework for solving the Traveling Salesman Problem with Drones (TSP-D)<n>The architecture consists of two primary computation: a Transformer-inspired encoder and an efficient Minimal Gated Unit decoder.<n>The entire framework operates within an asynchronous advantage actor-critic paradigm.
arXiv Detail & Related papers (2025-11-07T14:26:29Z) - Strategic Coordination for Evolving Multi-agent Systems: A Hierarchical Reinforcement and Collective Learning Approach [0.0]
Reinforcement learning offers a way to model sequential decision-making.<n>Agents take high-level strategies using MARL to group possible plans for action space reduction.<n>Low-level collective learning layer ensures efficient and decentralized coordinated decisions.
arXiv Detail & Related papers (2025-09-22T17:58:45Z) - AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [129.44038804430542]
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.<n>We propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization.<n>Our agents match or surpass commercial models on 27 tasks across diverse environments.
arXiv Detail & Related papers (2025-09-10T16:46:11Z) - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [103.32591749156416]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z) - Large Language Models for Design Structure Matrix Optimization [4.513609458468522]
In complex engineering systems, the interdependencies among components or development activities are often modeled and analyzed using Design Structure Matrix (DSM)<n>Reorganizing elements within a DSM to minimize feedback loops and enhance modularity or process efficiency constitutes a challenging optimization problem in engineering design and operations.<n>In this study, we explore the potential of Large Language Models (LLMs) for helping solve such CO problems by leveraging their capabilities for advanced reasoning and contextual understanding.
arXiv Detail & Related papers (2025-06-11T13:53:35Z) - Multi-Agent Collaboration via Evolving Orchestration [61.93162413517026]
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving.<n>We propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a central orchestrator dynamically directs agents in response to evolving task states.<n> Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs.
arXiv Detail & Related papers (2025-05-26T07:02:17Z) - Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z) - Residual Learning Inspired Crossover Operator and Strategy Enhancements for Evolutionary Multitasking [0.3749861135832073]
In evolutionary multitasking, strategies such as crossover operators and skill factor assignment are critical for effective knowledge transfer.<n>This paper proposes the Multifactorial Evolutionary Algorithm-Residual Learning (MFEA-RL) method based on residual learning.<n>A ResNet-based mechanism dynamically assigns skill factors to improve task adaptability, while a random mapping mechanism efficiently performs crossover operations.
arXiv Detail & Related papers (2025-03-27T10:27:17Z) - A Cascading Cooperative Multi-agent Framework for On-ramp Merging Control Integrating Large Language Models [26.459779380808587]
We introduce the Cascading Cooperative Multi-agent ( CCMA) framework, integrating RL for individual interactions, a fine-tuned Large Language Model (LLM) for regional cooperation, a reward function for global optimization, and the Retrieval-augmented Generation mechanism to dynamically optimize decision-making across complex driving scenarios.<n>Our experiments demonstrate that the CCMA outperforms existing RL methods, demonstrating significant improvements in both micro and macro-level performance in complex driving environments.
arXiv Detail & Related papers (2025-03-11T09:08:04Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.