Related papers: Contextual Dynamic Pricing with Strategic Buyers

Contextual Dynamic Pricing with Strategic Buyers

URL: http://arxiv.org/abs/2307.04055v2
Date: Tue, 25 Jun 2024 18:25:54 GMT
Title: Contextual Dynamic Pricing with Strategic Buyers
Authors: Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun,
Abstract summary: We study the contextual dynamic pricing problem with strategic buyers. Seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. We propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue.
Score: 93.97401997137564
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized pricing, which involves tailoring prices based on individual characteristics, is commonly used by firms to implement a consumer-specific pricing policy. In this process, buyers can also strategically manipulate their feature data to obtain a lower price, incurring certain manipulation costs. Such strategic behavior can hinder firms from maximizing their profits. In this paper, we study the contextual dynamic pricing problem with strategic buyers. The seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior. In addition, the seller does not observe the buyers' valuation of the product, but only a binary response indicating whether a sale happens or not. Recognizing these challenges, we propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue. We first prove that existing non-strategic pricing policies that neglect the buyers' strategic behavior result in a linear $\Omega(T)$ regret with $T$ the total time horizon, indicating that these policies are not better than a random pricing policy. We then establish that our proposed policy achieves a sublinear regret upper bound of $O(\sqrt{T})$. Importantly, our policy is not a mere amalgamation of existing dynamic pricing policies and strategic behavior handling algorithms. Our policy can also accommodate the scenario when the marginal cost of manipulation is unknown in advance. To account for it, we simultaneously estimate the valuation parameter and the cost parameter in the online pricing policy, which is shown to also achieve an $O(\sqrt{T})$ regret bound. Extensive experiments support our theoretical developments and demonstrate the superior performance of our policy compared to other pricing policies that are unaware of the strategic behaviors.

Related papers

Learning to Lead: Incentivizing Strategic Agents in the Dark [50.93875404941184]
We study an online learning version of the generalized principal-agent model.<n>We develop the first provably sample-efficient algorithm for this challenging setting.<n>We establish a near optimal $tildeO(sqrtT) $ regret bound for learning the principal's optimal policy.
arXiv Detail & Related papers (2025-06-10T04:25:04Z)
Optimal Nonlinear Online Learning under Sequential Price Competition via s-Concavity [24.586053819490985]
We consider price competition among multiple sellers over a selling horizon of $T$ periods. Sellers simultaneously offer their prices and observe their respective demand that is unobservable to competitors. We show that when all sellers employ our policy, their prices converge at a rate of $O(T-1/7)$ to the Nash equilibrium prices that sellers would reach if they were fully informed.
arXiv Detail & Related papers (2025-03-20T22:51:03Z)
Fairness-aware Contextual Dynamic Pricing with Strategic Buyers [4.883313216485195]
We propose a dynamic pricing policy that simultaneously achieves price fairness and discourages strategic behaviors. Our policy achieves an upper bound of $O(sqrt+H(T))$ regret over $T$ time horizons. We also prove an $Omega(sqrtT)$ regret lower bound of any pricing policy under our problem setting.
arXiv Detail & Related papers (2025-01-25T22:30:37Z)
A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing [20.06425698412548]
This paper studies offline dynamic pricing without data coverage assumption. We establish a partial identification bound for the demand parameter whose associated price is unobserved. We incorporate pessimistic and opportunistic strategies within the proposed partial identification framework to derive the estimated policy.
arXiv Detail & Related papers (2024-11-12T19:09:41Z)
A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z)
Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions [4.089889918897877]
We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer.<n>We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids.<n>We show that these strategies satisfy a mild no-overbidding condition, depend only on the valuation curve of the bidder, and the bidder can focus on a finite subset without loss of generality.
arXiv Detail & Related papers (2024-06-06T01:29:47Z)
Dynamic Pricing and Learning with Long-term Reference Effects [16.07344044662994]
We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller. We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model. We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown.
arXiv Detail & Related papers (2024-02-19T21:36:54Z)
Pricing with Contextual Elasticity and Heteroscedastic Valuation [23.96777734246062]
We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price. We introduce a novel approach to modeling a customer's expected demand by incorporating feature-based price elasticity. Our results shed light on the relationship between contextual elasticity and heteroscedastic valuation, providing insights for effective and practical pricing strategies.
arXiv Detail & Related papers (2023-12-26T11:07:37Z)
Off-Policy Evaluation for Large Action Spaces via Policy Convolution [60.6953713877886]
Policy Convolution family of estimators uses latent structure within actions to strategically convolve the logging and target policies. Experiments on synthetic and benchmark datasets demonstrate remarkable mean squared error (MSE) improvements when using PC.
arXiv Detail & Related papers (2023-10-24T01:00:01Z)
Strategic Apple Tasting [35.25249063553063]
Algorithmic decision-making in high-stakes domains often involves assigning decisions to agents with incentives to strategically modify their input to the algorithm. We formalize this setting as an online learning problem with apple-tasting feedback. Our goal is to achieve sublinear strategic regret, which compares the performance of the principal to that of the best fixed policy in hindsight.
arXiv Detail & Related papers (2023-06-09T20:46:31Z)
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time. We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z)
Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$. We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed. We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z)
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments [55.41685740015095]
We study offline reinforcement learning under a novel model called strategic MDP. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN)
arXiv Detail & Related papers (2022-08-23T15:32:44Z)
Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions [13.234975857626752]
We consider the problem of robust learning of reserve prices against strategic buyers in contextual second-price auctions. We propose learning policies that are robust to such strategic behavior.
arXiv Detail & Related papers (2020-02-25T19:00:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.