From Confounding to Learning: Dynamic Service Fee Pricing on Third-Party Platforms
- URL: http://arxiv.org/abs/2512.22749v1
- Date: Sun, 28 Dec 2025 02:41:36 GMT
- Title: From Confounding to Learning: Dynamic Service Fee Pricing on Third-Party Platforms
- Authors: Rui Ai, David Simchi-Levi, Feng Zhu,
- Abstract summary: We study the pricing behavior of third-party platforms facing strategic agents.<n>We develop an algorithm with optimal regret of $TildecO(sqrtTwedge_S-2)$.
- Score: 16.56794300689239
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the pricing behavior of third-party platforms facing strategic agents. Assuming the platform is a revenue maximizer, it observes market features that generally affect demand. Since only the equilibrium price and quantity are observable, this presents a general demand learning problem under confounding. Mathematically, we develop an algorithm with optimal regret of $\Tilde{\cO}(\sqrt{T}\wedgeσ_S^{-2})$. Our results reveal that supply-side noise fundamentally affects the learnability of demand, leading to a phase transition in regret. Technically, we show that non-i.i.d. actions can serve as instrumental variables for learning demand. We also propose a novel homeomorphic construction that allows us to establish estimation bounds without assuming star-shapedness, providing the first efficiency guarantee for learning demand with deep neural networks. Finally, we demonstrate the practical applicability of our approach through simulations and real-world data from Zomato and Lyft.
Related papers
- How to Set the Learning Rate for Large-Scale Pre-training? [73.03133634525635]
We formalize this investigation into two distinct research paradigms: Fitting and Transfer.<n>Within the Fitting Paradigm, we introduce a Scaling Law for search factor, effectively reducing the search complexity from O(n3) to O(n*C_D*C_) via predictive modeling.<n>We extend the principles of $$Transfer to the Mixture of Experts (MoE) architecture, broadening its applicability to encompass model depth, weight decay, and token horizons.
arXiv Detail & Related papers (2026-01-08T15:55:13Z) - Intention-Conditioned Flow Occupancy Models [80.42634994902858]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z) - Feature Learning Beyond the Edge of Stability [8.430481660019451]
We propose a homogeneous multilayer perceptron parameterization with hidden layer width pattern and analyze its training dynamics under gradient descent.<n>We obtain formulas for the first three Taylor coefficients of the minibatch loss during training that illuminate the connection between sharpness and feature learning.
arXiv Detail & Related papers (2025-02-18T18:23:33Z) - Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - An Empirical Study of $μ$P Learning Rate Transfer [0.0]
We show that the $mu$-Transfer method can yield near-optimal learning rates in practice.<n>Despite its evident promise, the $mu$P method is not yet widely adopted.
arXiv Detail & Related papers (2024-04-08T17:59:44Z) - Dynamic Pricing and Learning with Long-term Reference Effects [16.07344044662994]
We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller.
We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model.
We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown.
arXiv Detail & Related papers (2024-02-19T21:36:54Z) - Does Machine Learning Amplify Pricing Errors in the Housing Market? --
The Economics of Machine Learning Feedback Loops [2.5699371511994777]
We develop an analytical model of machine learning feedback loops in the context of the housing market.
We show that feedback loops lead machine learning algorithms to become overconfident in their own accuracy.
We then identify conditions where the economic payoffs for home sellers at the feedback loop equilibrium is worse off than no machine learning.
arXiv Detail & Related papers (2023-02-18T23:20:57Z) - Scaling Laws Beyond Backpropagation [64.0476282000118]
We study the ability of Direct Feedback Alignment to train causal decoder-only Transformers efficiently.
We find that DFA fails to offer more efficient scaling than backpropagation.
arXiv Detail & Related papers (2022-10-26T10:09:14Z) - Dynamic Pricing and Learning under the Bass Model [16.823029377470366]
We develop an algorithm that satisfies a high probability regret guarantee of order $tilde O(m2/3)$; where the market size $m$ is known a priori.
Unlike most regret analysis results, in the present problem the market size $m$ is the fundamental driver of the complexity.
arXiv Detail & Related papers (2021-03-09T03:27:33Z) - DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and
Feature Selection for Financial Data Analysis [22.035287788330663]
We propose DoubleEnsemble, an ensemble framework leveraging learning trajectory based sample reweighting and shuffling based feature selection.
Our model is applicable to a wide range of base models, capable of extracting complex patterns, while mitigating the overfitting and instability issues for financial market prediction.
arXiv Detail & Related papers (2020-10-03T02:57:10Z) - Budget Learning via Bracketing [50.085728094234476]
The budget learning problem poses the learner's goal as minimising use of the cloud while suffering no discernible loss in accuracy.
We propose a new formulation for the BL problem via the concept of bracketings.
We empirically validate our theory on real-world datasets, demonstrating improved performance over prior gating based methods.
arXiv Detail & Related papers (2020-04-14T04:38:14Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.