Product Segmentation Newsvendor Problems: A Robust Learning Approach
- URL: http://arxiv.org/abs/2207.03801v1
- Date: Fri, 8 Jul 2022 10:13:10 GMT
- Title: Product Segmentation Newsvendor Problems: A Robust Learning Approach
- Authors: Xiaoli Yan, Hui Yu, Jiawen Li, Frank Youhua Chen
- Abstract summary: Product segmentation newsvendor problem is a new variant of the newsvendor problem.
We propose a new paradigm termed robust learning to increase the attractiveness of robust policies.
- Score: 6.346881818701668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose and analyze a product segmentation newsvendor problem, which
generalizes the phenomenon of segmentation sales of a class of perishable
items. The product segmentation newsvendor problem is a new variant of the
newsvendor problem, reflecting that sellers maximize profits by determining the
inventory of the whole item in the context of uncertain demand for sub-items.
We derive the closed-form robust ordering decision by assuming that the means
and covariance matrix of stochastic demand are available but not the
distributions. However, robust approaches that always trade-off in the
worst-case demand scenario face a concern in solution conservatism; thus, the
traditional robust schemes offer unsatisfactory. In this paper, we integrate
robust and deep reinforcement learning (DRL) techniques and propose a new
paradigm termed robust learning to increase the attractiveness of robust
policies. Notably, we take the robust decision as human domain knowledge and
implement it into the training process of DRL by designing a full-process
human-machine collaborative mechanism of teaching experience, normative
decision, and regularization return. Simulation results confirm that our
approach effectively improves robust performance and can generalize to various
problems that require robust but less conservative solutions. Simultaneously,
fewer training episodes, increased training stability, and interpretability of
behavior may have the opportunity to facilitate the deployment of DRL
algorithms in operational practice. Furthermore, the successful attempt of
RLDQN to solve the 1000-dimensional demand scenarios reveals that the algorithm
provides a path to solve complex operational problems through human-machine
collaboration and may have potential significance for solving other complex
operational management problems.
Related papers
- Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL)
Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies.
Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z) - Deep Generative Demand Learning for Newsvendor and Pricing [7.594251468240168]
We consider data-driven inventory and pricing decisions in the feature-based newsvendor problem.
We propose a novel approach leveraging conditional deep generative models (cDGMs) to address these challenges.
We provide theoretical guarantees for our approach, including the consistency of profit estimation and convergence of our decisions to the optimal solution.
arXiv Detail & Related papers (2024-11-13T14:17:26Z) - Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment [15.273192037219077]
We study the dynamic pricing and replenishment problems under inconsistent decision frequencies.
We integrate a decision tree-based machine learning approach, trained on comprehensive market data.
In this approach, two agents handle pricing and inventory and are updated on different scales.
arXiv Detail & Related papers (2024-10-28T15:12:04Z) - Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Combinatorial Optimization with Policy Adaptation using Latent Space Search [44.12073954093942]
We present a novel approach for designing performant algorithms to solve complex, typically NP-hard, problems.
We show that our search strategy outperforms state-of-the-art approaches on 11 standard benchmarking tasks.
arXiv Detail & Related papers (2023-11-13T12:24:54Z) - Accelerate Presolve in Large-Scale Linear Programming via Reinforcement
Learning [92.31528918811007]
We propose a simple and efficient reinforcement learning framework -- namely, reinforcement learning for presolve (RL4Presolve) -- to tackle (P1)-(P3) simultaneously.
Experiments on two solvers and eight benchmarks (real-world and synthetic) demonstrate that RL4Presolve significantly and consistently improves the efficiency of solving large-scale LPs.
arXiv Detail & Related papers (2023-10-18T09:51:59Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Deep Policy Iteration with Integer Programming for Inventory Management [8.27175065641495]
We present a framework for optimizing long-term discounted reward problems with large accessible action space and state dependent constraints.
Our proposed Programmable Actor Reinforcement Learning (PARL) uses a deep-policy method that leverages neural networks (NNs) to approximate the value function.
We benchmark the proposed algorithm against state-of-the-art RL algorithms and commonly used replenishments and find it considerably outperforms existing methods by as much as 14.7% on average.
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - Assured RL: Reinforcement Learning with Almost Sure Constraints [0.0]
We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets.
We define value and action-value functions that satisfy a barrier-based decomposition.
We develop a Barrier-learning algorithm, based on Q-Learning, that identifies such unsafe state-action pairs.
arXiv Detail & Related papers (2020-12-24T00:29:28Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.