Learning to Price Supply Chain Contracts against a Learning Retailer
- URL: http://arxiv.org/abs/2211.04586v1
- Date: Wed, 2 Nov 2022 04:00:47 GMT
- Title: Learning to Price Supply Chain Contracts against a Learning Retailer
- Authors: Xuejun Zhao, Ruihao Zhu, William B. Haskell
- Abstract summary: We study the supply chain contract design problem faced by a data-driven supplier.
Both the supplier and the retailer are uncertain about the market demand.
We show that our pricing policies lead to sublinear regret bounds in all these cases.
- Score: 3.7814216736076434
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of big data analytics has automated the decision-making of companies
and increased supply chain agility. In this paper, we study the supply chain
contract design problem faced by a data-driven supplier who needs to respond to
the inventory decisions of the downstream retailer. Both the supplier and the
retailer are uncertain about the market demand and need to learn about it
sequentially. The goal for the supplier is to develop data-driven pricing
policies with sublinear regret bounds under a wide range of possible retailer
inventory policies for a fixed time horizon.
To capture the dynamics induced by the retailer's learning policy, we first
make a connection to non-stationary online learning by following the notion of
variation budget. The variation budget quantifies the impact of the retailer's
learning strategy on the supplier's decision-making. We then propose dynamic
pricing policies for the supplier for both discrete and continuous demand. We
also note that our proposed pricing policy only requires access to the support
of the demand distribution, but critically, does not require the supplier to
have any prior knowledge about the retailer's learning policy or the demand
realizations. We examine several well-known data-driven policies for the
retailer, including sample average approximation, distributionally robust
optimization, and parametric approaches, and show that our pricing policies
lead to sublinear regret bounds in all these cases.
At the managerial level, we answer affirmatively that there is a pricing
policy with a sublinear regret bound under a wide range of retailer's learning
policies, even though she faces a learning retailer and an unknown demand
distribution. Our work also provides a novel perspective in data-driven
operations management where the principal has to learn to react to the learning
policies employed by other agents in the system.
Related papers
- Enhancing Supply Chain Visibility with Knowledge Graphs and Large Language Models [49.898152180805454]
This paper presents a novel framework leveraging Knowledge Graphs (KGs) and Large Language Models (LLMs) to enhance supply chain visibility.
Our zero-shot, LLM-driven approach automates the extraction of supply chain information from diverse public sources.
With high accuracy in NER and RE tasks, it provides an effective tool for understanding complex, multi-tiered supply networks.
arXiv Detail & Related papers (2024-08-05T17:11:29Z) - A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Decentralized Learning Strategies for Estimation Error Minimization with Graph Neural Networks [94.2860766709971]
We address the challenge of sampling and remote estimation for autoregressive Markovian processes in a wireless network with statistically-identical agents.
Our goal is to minimize time-average estimation error and/or age of information with decentralized scalable sampling and transmission policies.
arXiv Detail & Related papers (2024-04-04T06:24:11Z) - Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - A Knowledge Graph Perspective on Supply Chain Resilience [15.028130016717773]
Global crises and regulatory developments require increased supply chain transparency and resilience.
Information about supply chains, especially at the deeper levels, is often intransparent and incomplete.
By connecting different data sources, we model the supply network as a knowledge graph and achieve transparency up to tier-3 suppliers.
arXiv Detail & Related papers (2023-05-15T10:14:30Z) - Interpretable Reinforcement Learning via Neural Additive Models for
Inventory Management [3.714118205123092]
We focus on developing dynamic inventory ordering policies for a multi-echelon, i.e. multi-stage, supply chain.
Traditional inventory optimization methods aim to determine a static reordering policy.
We propose an interpretable reinforcement learning approach that aims to be as interpretable as the traditional static policies.
arXiv Detail & Related papers (2023-03-18T10:13:32Z) - Playing hide and seek: tackling in-store picking operations while
improving customer experience [0.0]
We formalize a new problem called Dynamic In-store Picker Problem routing (diPRP)
In this relevant problem - diPRP - a picker tries to pick online orders while minimizing customer encounters.
Our work suggests that retailers should be able to scale the in-store picking of online orders without jeopardizing the experience of offline customers.
arXiv Detail & Related papers (2023-01-05T16:35:17Z) - Regularizing a Model-based Policy Stationary Distribution to Stabilize
Offline Reinforcement Learning [62.19209005400561]
offline reinforcement learning (RL) extends the paradigm of classical RL algorithms to purely learning from static datasets.
A key challenge of offline RL is the instability of policy training, caused by the mismatch between the distribution of the offline data and the undiscounted stationary state-action distribution of the learned policy.
We regularize the undiscounted stationary distribution of the current policy towards the offline data during the policy optimization process.
arXiv Detail & Related papers (2022-06-14T20:56:16Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Self-adapting Robustness in Demand Learning [1.949912057689623]
We study dynamic pricing over a finite number of periods in the presence of demand model ambiguity.
We develop an adaptively-robust-learning (ARL) pricing policy that learns the true model parameters from the data.
We characterize the behavior of ARL's self-adapting ambiguity sets and derive a regret bound that highlights the link between the scale of revenue loss and the customer arrival pattern.
arXiv Detail & Related papers (2020-11-21T01:15:54Z) - Interpretable Personalization via Policy Learning with Linear Decision
Boundaries [14.817218449140338]
effective personalization of goods and services has become a core business for companies to improve revenues and maintain a competitive edge.
This paper studies the personalization problem through the lens of policy learning.
We propose a class of policies with linear decision boundaries and propose learning algorithms using tools from causal inference.
arXiv Detail & Related papers (2020-03-17T05:48:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.