Related papers: A Tight Regret Analysis of Non-Parametric Repeated Contextual Brokerage

A Tight Regret Analysis of Non-Parametric Repeated Contextual Brokerage

URL: http://arxiv.org/abs/2503.02646v2
Date: Mon, 10 Mar 2025 07:17:55 GMT
Title: A Tight Regret Analysis of Non-Parametric Repeated Contextual Brokerage
Authors: François Bachoc, Tommaso Cesari, Roberto Colomboni,
Abstract summary: We study a contextual version of the repeated brokerage problem.<n>In each interaction, two traders with private valuations for an item seek to buy or sell based on the learner's-a broker-proposed price, which is informed by some contextual information.<n>The broker's goal is to maximize the traders' net utility-also known as the gain from trade-by minimizing regret compared to an oracle with perfect knowledge of traders' valuation distributions.
Score: 8.049531918823758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study a contextual version of the repeated brokerage problem. In each interaction, two traders with private valuations for an item seek to buy or sell based on the learner's-a broker-proposed price, which is informed by some contextual information. The broker's goal is to maximize the traders' net utility-also known as the gain from trade-by minimizing regret compared to an oracle with perfect knowledge of traders' valuation distributions. We assume that traders' valuations are zero-mean perturbations of the unknown item's current market value-which can change arbitrarily from one interaction to the next-and that similar contexts will correspond to similar market prices. We analyze two feedback settings: full-feedback, where after each interaction the traders' valuations are revealed to the broker, and limited-feedback, where only transaction attempts are revealed. For both feedback types, we propose algorithms achieving tight regret bounds. We further strengthen our performance guarantees by providing a tight 1/2-approximation result showing that the oracle that knows the traders' valuation distributions achieves at least 1/2 of the gain from trade of the omniscient oracle that knows in advance the actual realized traders' valuations.

Related papers

Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning [50.93804891554481]
We introduce a novel estimator based on the log-sum-exponential (LSE) operator, which outperforms traditional inverse propensity score estimators.<n>Our LSE estimator demonstrates variance reduction and robustness under heavy-tailed conditions.<n>In the off-policy learning scenario, we establish bounds on the regret -- the performance gap between our LSE estimator and the optimal policy.
arXiv Detail & Related papers (2025-06-07T17:37:10Z)
Collaborative Value Function Estimation Under Model Mismatch: A Federated Temporal Difference Analysis [55.13545823385091]
Federated reinforcement learning (FedRL) enables collaborative learning while preserving data privacy by preventing direct data exchange between agents. In real-world applications, each agent may experience slightly different transition dynamics, leading to inherent model mismatches. We show that even moderate levels of information sharing can significantly mitigate environment-specific errors.
arXiv Detail & Related papers (2025-03-21T18:06:28Z)
When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments [55.19252983108372]
We have developed a multi-agent AI system called StockAgent, driven by LLMs. The StockAgent allows users to evaluate the impact of different external factors on investor trading. It avoids the test set leakage issue present in existing trading simulation systems based on AI Agents.
arXiv Detail & Related papers (2024-07-15T06:49:30Z)
Reinforcement Learning for Corporate Bond Trading: A Sell Side Perspective [0.0]
A corporate bond trader provides a quote by adding a spread over a textitprevalent market price For illiquid bonds, the market price is harder to observe, and traders often resort to available benchmark bond prices. In this paper, we approach the estimation of an optimal bid-ask spread quoting strategy in a data driven manner and show that it can be learned using Reinforcement Learning.
arXiv Detail & Related papers (2024-06-18T18:02:35Z)
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard [52.31989962031179]
We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee to deviations by strategic agents.
arXiv Detail & Related papers (2024-06-06T16:18:20Z)
On Bits and Bandits: Quantifying the Regret-Information Trade-off [62.64904903955711]
We study the trade-off between the information an agent accumulates and the regret it suffers.<n>We introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates.<n>We also prove regret upper bounds using the amount of information the agent accumulates.
arXiv Detail & Related papers (2024-05-26T14:18:38Z)
Fair Online Bilateral Trade [20.243000364933472]
We present a complete characterization of the regret for fair gain from trade when, after each interaction, the platform only learns whether each trader accepted the current price. We conclude by providing tight regret bounds when, after each interaction, the platform is allowed to observe the true traders' valuations.
arXiv Detail & Related papers (2024-05-22T18:49:11Z)
A Contextual Online Learning Theory of Brokerage [8.049531918823758]
We study the role of contextual information in the online learning problem of brokerage between traders. We show that if the bounded density assumption is lifted, then the problem becomes unlearnable.
arXiv Detail & Related papers (2024-05-22T18:38:05Z)
Trading Volume Maximization with Online Learning [3.8059763597999012]
We investigate how the broker should behave to maximize the trading volume. We model the traders' valuations as an i.i.d. process with an unknown distribution. If only their willingness to sell or buy at the proposed price is revealed after each interaction, we provide an algorithm achieving poly-logarithmic regret.
arXiv Detail & Related papers (2024-05-21T17:26:44Z)
Online Decision Mediation [72.80902932543474]
Consider learning a decision support assistant to serve as an intermediary between (oracle) expert behavior and (imperfect) human behavior. In clinical diagnosis, fully-autonomous machine behavior is often beyond ethical affordances.
arXiv Detail & Related papers (2023-10-28T05:59:43Z)
No-Regret Learning in Bilateral Trade via Global Budget Balance [29.514323697659613]
We provide the first no-regret algorithms for adversarial bilateral trade under various feedback models. We show that in the full-feedback model, the learner can guarantee $tilde O(sqrtT)$ regret against the best fixed prices in hindsight. We also provide a learning algorithm guaranteeing a $tilde O(T3/4)$ regret upper bound with one-bit feedback.
arXiv Detail & Related papers (2023-10-18T22:34:32Z)
An Online Learning Theory of Brokerage [3.8059763597999012]
We investigate brokerage between traders from an online learning perspective. Unlike other bilateral trade problems already studied, we focus on the case where there are no designated buyer and seller roles. We show that the optimal rate degrades to $sqrtT$ in the first case, and the problem becomes unlearnable in the second.
arXiv Detail & Related papers (2023-10-18T17:01:32Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
A Dataset on Malicious Paper Bidding in Peer Review [84.68308372858755]
Malicious reviewers strategically bid in order to unethically manipulate the paper assignment. A critical impediment towards creating and evaluating methods to mitigate this issue is the lack of publicly-available data on malicious paper bidding. We release a novel dataset, collected from a mock conference activity where participants were instructed to bid either honestly or maliciously.
arXiv Detail & Related papers (2022-06-24T20:23:33Z)
Certifying Strategyproof Auction Networks [53.37051312298459]
We focus on the RegretNet architecture, which can represent auctions with arbitrary numbers of items and participants. We propose ways to explicitly verify strategyproofness under a particular valuation profile using techniques from the neural network verification literature.
arXiv Detail & Related papers (2020-06-15T20:22:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.