An Online Learning Theory of Brokerage
- URL: http://arxiv.org/abs/2310.12107v1
- Date: Wed, 18 Oct 2023 17:01:32 GMT
- Title: An Online Learning Theory of Brokerage
- Authors: Nata\v{s}a Boli\'c, Tommaso Cesari, Roberto Colomboni
- Abstract summary: We investigate brokerage between traders from an online learning perspective.
Unlike other bilateral trade problems already studied, we focus on the case where there are no designated buyer and seller roles.
We show that the optimal rate degrades to $sqrtT$ in the first case, and the problem becomes unlearnable in the second.
- Score: 3.8059763597999012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate brokerage between traders from an online learning perspective.
At any round $t$, two traders arrive with their private valuations, and the
broker proposes a trading price. Unlike other bilateral trade problems already
studied in the online learning literature, we focus on the case where there are
no designated buyer and seller roles: each trader will attempt to either buy or
sell depending on the current price of the good.
We assume the agents' valuations are drawn i.i.d. from a fixed but unknown
distribution. If the distribution admits a density bounded by some constant
$M$, then, for any time horizon $T$:
$\bullet$ If the agents' valuations are revealed after each interaction, we
provide an algorithm achieving regret $M \log T$ and show this rate is optimal,
up to constant factors.
$\bullet$ If only their willingness to sell or buy at the proposed price is
revealed after each interaction, we provide an algorithm achieving regret
$\sqrt{M T}$ and show this rate is optimal, up to constant factors.
Finally, if we drop the bounded density assumption, we show that the optimal
rate degrades to $\sqrt{T}$ in the first case, and the problem becomes
unlearnable in the second.
Related papers
- Market Making without Regret [15.588799679661637]
We consider a sequential decision-making setting where, at every round $t$, a market maker posts a bid price $B_t$ and an ask price $A_t$ to an incoming trader.
If the trader's valuation is lower than the bid price, or higher than the ask price, then a trade (sell or buy) occurs.
We characterize the maker's regret with respect to the best fixed choice of bid and ask pairs.
arXiv Detail & Related papers (2024-11-21T10:13:55Z) - When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments [55.19252983108372]
We have developed a multi-agent AI system called StockAgent, driven by LLMs.
The StockAgent allows users to evaluate the impact of different external factors on investor trading.
It avoids the test set leakage issue present in existing trading simulation systems based on AI Agents.
arXiv Detail & Related papers (2024-07-15T06:49:30Z) - Fair Online Bilateral Trade [20.243000364933472]
We present a complete characterization of the regret for fair gain from trade when, after each interaction, the platform only learns whether each trader accepted the current price.
We conclude by providing tight regret bounds when, after each interaction, the platform is allowed to observe the true traders' valuations.
arXiv Detail & Related papers (2024-05-22T18:49:11Z) - A Contextual Online Learning Theory of Brokerage [8.049531918823758]
We study the role of contextual information in the online learning problem of brokerage between traders.
We show that if the bounded density assumption is lifted, then the problem becomes unlearnable.
arXiv Detail & Related papers (2024-05-22T18:38:05Z) - Trading Volume Maximization with Online Learning [3.8059763597999012]
We investigate how the broker should behave to maximize the trading volume.
We model the traders' valuations as an i.i.d. process with an unknown distribution.
If only their willingness to sell or buy at the proposed price is revealed after each interaction, we provide an algorithm achieving poly-logarithmic regret.
arXiv Detail & Related papers (2024-05-21T17:26:44Z) - Online Learning in Contextual Second-Price Pay-Per-Click Auctions [47.06746975822902]
We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads.
The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy.
arXiv Detail & Related papers (2023-10-08T07:04:22Z) - Repeated Bilateral Trade Against a Smoothed Adversary [5.939280057673226]
We study repeated bilateral trade where an adaptive $sigma$-smooth adversary generates the valuations of sellers and buyers.
We provide a complete characterization of the regret regimes for fixed-price mechanisms under different feedback models.
arXiv Detail & Related papers (2023-02-21T16:30:10Z) - Uniswap Liquidity Provision: An Online Learning Approach [49.145538162253594]
Decentralized Exchanges (DEXs) are new types of marketplaces leveraging technology.
One such DEX, Uniswap v3, allows liquidity providers to allocate funds more efficiently by specifying an active price interval for their funds.
This introduces the problem of finding an optimal strategy for choosing price intervals.
We formalize this problem as an online learning problem with non-stochastic rewards.
arXiv Detail & Related papers (2023-02-01T17:21:40Z) - Probably Anytime-Safe Stochastic Combinatorial Semi-Bandits [81.60136088841948]
We propose an algorithm that minimizes the regret over the horizon of time $T$.
The proposed algorithm is applicable to domains such as recommendation systems and transportation.
arXiv Detail & Related papers (2023-01-31T03:49:00Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.