Nonparametric Contextual Online Bilateral Trade
- URL: http://arxiv.org/abs/2602.12904v1
- Date: Fri, 13 Feb 2026 13:03:30 GMT
- Title: Nonparametric Contextual Online Bilateral Trade
- Authors: Emanuele Coccia, Martino Bernasconi, Andrea Celli,
- Abstract summary: We study the problem of contextual online bilateral trade.<n>The goal of the learner is to post prices to facilitate trades between the two parties.<n>We design an algorithm that leverages contextual information through a hierarchical tree construction.
- Score: 15.586783656868706
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of contextual online bilateral trade. At each round, the learner faces a seller-buyer pair and must propose a trade price without observing their private valuations for the item being sold. The goal of the learner is to post prices to facilitate trades between the two parties. Before posting a price, the learner observes a $d$-dimensional context vector that influences the agent's valuations. Prior work in the contextual setting has focused on linear models. In this work, we tackle a general nonparametric setting in which the buyer's and seller's valuations behave according to arbitrary Lipschitz functions of the context. We design an algorithm that leverages contextual information through a hierarchical tree construction and guarantees regret $\widetilde{O}(T^{{(d-1)}/d})$. Remarkably, our algorithm operates under two stringent features of the setting: (1) one-bit feedback, where the learner only observes whether a trade occurred or not, and (2) strong budget balance, where the learner cannot subsidize or profit from the market participants. We further provide a matching lower bound in the full-feedback setting, demonstrating the tightness of our regret bound.
Related papers
- Contextual Online Bilateral Trade [18.8734045754182]
We study two objectives for this problem, gain from trade and profit.<n>We design algorithms that achieve $O(dlog d)$ regret for gain from trade, and $O(d loglog T + dlog d)$ regret for profit.
arXiv Detail & Related papers (2026-02-13T13:03:10Z) - A Tight Regret Analysis of Non-Parametric Repeated Contextual Brokerage [8.049531918823758]
We study a contextual version of the repeated brokerage problem.<n>In each interaction, two traders with private valuations for an item seek to buy or sell based on the learner's-a broker-proposed price, which is informed by some contextual information.<n>The broker's goal is to maximize the traders' net utility-also known as the gain from trade-by minimizing regret compared to an oracle with perfect knowledge of traders' valuation distributions.
arXiv Detail & Related papers (2025-03-03T08:42:55Z) - Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information [57.287431079644705]
We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers.<n>We provide learning algorithms for the leader which achieve $O(T1/2)$ regret under bandit feedback.
arXiv Detail & Related papers (2025-01-31T22:40:57Z) - A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - A Parametric Contextual Online Learning Theory of Brokerage [8.049531918823758]
We study the role of contextual information in the online learning problem of brokerage between traders.<n>In this sequential problem, at each time step, two traders arrive with secret valuations about an asset they wish to trade.<n>The learner (a broker) suggests a trading (or brokerage) price based on contextual data about the asset and the market conditions.<n>Then, the traders reveal their willingness to buy or sell based on whether their valuations are higher or lower than the brokerage price.
arXiv Detail & Related papers (2024-05-22T18:38:05Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)<n>We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.<n>We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Salespeople vs SalesBot: Exploring the Role of Educational Value in
Conversational Recommender Systems [78.84530426424838]
Existing conversational recommender systems often overlook users' lack of background knowledge, focusing solely on gathering preferences.
We introduce SalesOps, a framework that facilitates the simulation and evaluation of such systems.
We build SalesBot and ShopperBot, a pair of LLM-powered agents that can simulate either side of the framework.
arXiv Detail & Related papers (2023-10-26T19:44:06Z) - No-Regret Learning in Bilateral Trade via Global Budget Balance [29.514323697659613]
We provide the first no-regret algorithms for adversarial bilateral trade under various feedback models.
We show that in the full-feedback model, the learner can guarantee $tilde O(sqrtT)$ regret against the best fixed prices in hindsight.
We also provide a learning algorithm guaranteeing a $tilde O(T3/4)$ regret upper bound with one-bit feedback.
arXiv Detail & Related papers (2023-10-18T22:34:32Z) - Online Learning in Contextual Second-Price Pay-Per-Click Auctions [47.06746975822902]
We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads.
The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy.
arXiv Detail & Related papers (2023-10-08T07:04:22Z) - Language of Bargaining [60.218128617765046]
We build a novel dataset for studying how the use of language shapes bilateral bargaining.
Our work also reveals linguistic signals that are predictive of negotiation outcomes.
arXiv Detail & Related papers (2023-06-12T13:52:01Z) - Repeated Bilateral Trade Against a Smoothed Adversary [5.939280057673226]
We study repeated bilateral trade where an adaptive $sigma$-smooth adversary generates the valuations of sellers and buyers.
We provide a complete characterization of the regret regimes for fixed-price mechanisms under different feedback models.
arXiv Detail & Related papers (2023-02-21T16:30:10Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - Optimal No-regret Learning in Repeated First-price Auctions [38.908235632001116]
We study online learning in repeated first-price auctions.
We develop the first learning algorithm that achieves a near-optimal $widetildeO(sqrtT)$ regret bound.
arXiv Detail & Related papers (2020-03-22T03:32:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.