Learning to Price with Resource Constraints: From Full Information to Machine-Learned Prices
- URL: http://arxiv.org/abs/2501.14155v1
- Date: Fri, 24 Jan 2025 00:46:52 GMT
- Title: Learning to Price with Resource Constraints: From Full Information to Machine-Learned Prices
- Authors: Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi,
- Abstract summary: We study the dynamic pricing problem with knapsack, addressing the challenge of balancing exploration and exploitation under resource constraints.
We introduce three algorithms tailored to different informational settings: a Boundary Attracted Re-solve Method for full information, an online learning algorithm for scenarios with no prior information, and an estimate-then-select re-solve algorithm that leverages machine-learned informed prices with known upper bound of estimation errors.
- Score: 13.68761797358598
- License:
- Abstract: We study the dynamic pricing problem with knapsack, addressing the challenge of balancing exploration and exploitation under resource constraints. We introduce three algorithms tailored to different informational settings: a Boundary Attracted Re-solve Method for full information, an online learning algorithm for scenarios with no prior information, and an estimate-then-select re-solve algorithm that leverages machine-learned informed prices with known upper bound of estimation errors. The Boundary Attracted Re-solve Method achieves logarithmic regret without requiring the non-degeneracy condition, while the online learning algorithm attains an optimal $O(\sqrt{T})$ regret. Our estimate-then-select approach bridges the gap between these settings, providing improved regret bounds when reliable offline data is available. Numerical experiments validate the effectiveness and robustness of our algorithms across various scenarios. This work advances the understanding of online resource allocation and dynamic pricing, offering practical solutions adaptable to different informational structures.
Related papers
- Joint Pricing and Resource Allocation: An Optimal Online-Learning Approach [20.70943884841438]
We study an online learning horizon where we make joint pricing and inventory decisions to maximize the overall net profit.
We develop an efficient algorithm that utilizes a "Confidence Bound" strategy over multiple OCO.
arXiv Detail & Related papers (2025-01-29T23:23:54Z) - Cost-Aware Query Policies in Active Learning for Efficient Autonomous Robotic Exploration [0.0]
This paper analyzes an AL algorithm for Gaussian Process regression while incorporating action cost.
Traditional uncertainty metric with a distance constraint best minimizes root-mean-square error over trajectory distance.
arXiv Detail & Related papers (2024-10-31T18:35:03Z) - Exponentially Weighted Algorithm for Online Network Resource Allocation with Long-Term Constraints [0.6466206145151128]
This paper studies an online optimal resource reservation problem in communication networks with job transfers.
We propose a novel algorithm based on a randomized exponentially weighted method that encompasses long-term constraints.
arXiv Detail & Related papers (2024-05-03T10:12:40Z) - Improved Regret for Efficient Online Reinforcement Learning with Linear
Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions.
We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Explainable Data-Driven Optimization: From Context to Decision and Back
Again [76.84947521482631]
Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters.
We introduce a counterfactual explanation methodology tailored to explain solutions to data-driven problems.
We demonstrate our approach by explaining key problems in operations management such as inventory management and routing.
arXiv Detail & Related papers (2023-01-24T15:25:16Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Contextual Inverse Optimization: Offline and Online Learning [3.6739949215165164]
We study the problems of offline and online contextual optimization with feedback information.
We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle.
arXiv Detail & Related papers (2021-06-26T13:09:52Z) - Offline Inverse Reinforcement Learning [24.316047317028147]
offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available.
Inspired by the success of IRL techniques in achieving state of the art imitation performances in online settings, we exploit GAN based data augmentation procedures to construct the first offline IRL algorithm.
arXiv Detail & Related papers (2021-06-09T13:44:06Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.