Product Segmentation Newsvendor Problems: A Robust Learning Approach
- URL: http://arxiv.org/abs/2207.03801v1
- Date: Fri, 8 Jul 2022 10:13:10 GMT
- Title: Product Segmentation Newsvendor Problems: A Robust Learning Approach
- Authors: Xiaoli Yan, Hui Yu, Jiawen Li, Frank Youhua Chen
- Abstract summary: Product segmentation newsvendor problem is a new variant of the newsvendor problem.
We propose a new paradigm termed robust learning to increase the attractiveness of robust policies.
- Score: 6.346881818701668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose and analyze a product segmentation newsvendor problem, which
generalizes the phenomenon of segmentation sales of a class of perishable
items. The product segmentation newsvendor problem is a new variant of the
newsvendor problem, reflecting that sellers maximize profits by determining the
inventory of the whole item in the context of uncertain demand for sub-items.
We derive the closed-form robust ordering decision by assuming that the means
and covariance matrix of stochastic demand are available but not the
distributions. However, robust approaches that always trade-off in the
worst-case demand scenario face a concern in solution conservatism; thus, the
traditional robust schemes offer unsatisfactory. In this paper, we integrate
robust and deep reinforcement learning (DRL) techniques and propose a new
paradigm termed robust learning to increase the attractiveness of robust
policies. Notably, we take the robust decision as human domain knowledge and
implement it into the training process of DRL by designing a full-process
human-machine collaborative mechanism of teaching experience, normative
decision, and regularization return. Simulation results confirm that our
approach effectively improves robust performance and can generalize to various
problems that require robust but less conservative solutions. Simultaneously,
fewer training episodes, increased training stability, and interpretability of
behavior may have the opportunity to facilitate the deployment of DRL
algorithms in operational practice. Furthermore, the successful attempt of
RLDQN to solve the 1000-dimensional demand scenarios reveals that the algorithm
provides a path to solve complex operational problems through human-machine
collaboration and may have potential significance for solving other complex
operational management problems.
Related papers
- Dual-Agent Deep Reinforcement Learning for Dynamic Pricing and Replenishment [15.273192037219077]
We study the dynamic pricing and replenishment problems under inconsistent decision frequencies.
We integrate a decision tree-based machine learning approach, trained on comprehensive market data.
In this approach, two agents handle pricing and inventory and are updated on different scales.
arXiv Detail & Related papers (2024-10-28T15:12:04Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - Beyond Training: Optimizing Reinforcement Learning Based Job Shop Scheduling Through Adaptive Action Sampling [10.931466852026663]
We investigate the optimal use of trained deep reinforcement learning (DRL) agents during inference.
Our work is based on the hypothesis that, similar to search algorithms, the utilization of trained DRL agents should be dependent on the acceptable computational budget.
We propose an algorithm for obtaining the optimal parameterization for such a given number of solutions and any given trained agent.
arXiv Detail & Related papers (2024-06-11T14:59:18Z) - Joint Demonstration and Preference Learning Improves Policy Alignment with Human Feedback [58.049113055986375]
We develop a single stage approach named Alignment with Integrated Human Feedback (AIHF) to train reward models and the policy.
The proposed approach admits a suite of efficient algorithms, which can easily reduce to, and leverage, popular alignment algorithms.
We demonstrate the efficiency of the proposed solutions with extensive experiments involving alignment problems in LLMs and robotic control problems in MuJoCo.
arXiv Detail & Related papers (2024-06-11T01:20:53Z) - Combinatorial Optimization with Policy Adaptation using Latent Space Search [44.12073954093942]
We present a novel approach for designing performant algorithms to solve complex, typically NP-hard, problems.
We show that our search strategy outperforms state-of-the-art approaches on 11 standard benchmarking tasks.
arXiv Detail & Related papers (2023-11-13T12:24:54Z) - Accelerate Presolve in Large-Scale Linear Programming via Reinforcement
Learning [92.31528918811007]
We propose a simple and efficient reinforcement learning framework -- namely, reinforcement learning for presolve (RL4Presolve) -- to tackle (P1)-(P3) simultaneously.
Experiments on two solvers and eight benchmarks (real-world and synthetic) demonstrate that RL4Presolve significantly and consistently improves the efficiency of solving large-scale LPs.
arXiv Detail & Related papers (2023-10-18T09:51:59Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Math Programming based Reinforcement Learning for Multi-Echelon
Inventory Management [1.9161790404101895]
Reinforcement learning has lead to considerable break-throughs in diverse areas such as robotics, games and many others.
But the application to RL in complex real-world decision making problems remains limited.
These characteristics make the problem considerably harder to solve for existing RL methods that rely on enumeration techniques to solve per step action problems.
We show that a properly selected discretization of the underlying uncertain distribution can yield near optimal actor policy even with very few samples from the underlying uncertainty.
We find that PARL outperforms commonly used base stock by 44.7% and the best performing RL method by up to 12.1% on average
arXiv Detail & Related papers (2021-12-04T01:40:34Z) - Assured RL: Reinforcement Learning with Almost Sure Constraints [0.0]
We consider the problem of finding optimal policies for a Markov Decision Process with almost sure constraints on state transitions and action triplets.
We define value and action-value functions that satisfy a barrier-based decomposition.
We develop a Barrier-learning algorithm, based on Q-Learning, that identifies such unsafe state-action pairs.
arXiv Detail & Related papers (2020-12-24T00:29:28Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.