Related papers: TSEC: a framework for online experimentation under experimental constraints

TSEC: a framework for online experimentation under experimental constraints

URL: http://arxiv.org/abs/2101.06592v1
Date: Sun, 17 Jan 2021 05:04:12 GMT
Title: TSEC: a framework for online experimentation under experimental constraints
Authors: Simon Mak, Yuanshuo Zhou, Lavonne Hoang, C. F. Jeff Wu
Abstract summary: Thompson sampling is a popular algorithm for solving multi-armed bandit problems. We propose a new Thompson Sampling under Experimental Constraints (TSEC) method, which addresses this so-called "arm budget constraint" We demonstrate the effectiveness of TSEC in two problems with arm budget constraints.
Score: 1.1470070927586016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Thompson sampling is a popular algorithm for solving multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) $N$ can be large, and the data needed to make adaptive decisions require expensive experimentation. One is then faced with the constraint of experimenting on only a small subset of $K \ll N$ arms within each time period, which poses a problem for traditional Thompson sampling. We propose a new Thompson Sampling under Experimental Constraints (TSEC) method, which addresses this so-called "arm budget constraint". TSEC makes use of a Bayesian interaction model with effect hierarchy priors, to model correlations between rewards on different arms. This fitted model is then integrated within Thompson sampling, to jointly identify a good subset of arms for experimentation and to allocate resources over these arms. We demonstrate the effectiveness of TSEC in two problems with arm budget constraints. The first is a simulated website optimization study, where TSEC shows noticeable improvements over industry benchmarks. The second is a portfolio optimization application on industry-based exchange-traded funds, where TSEC provides more consistent and greater wealth accumulation over standard investment strategies.

Related papers

Improving Portfolio Optimization Results with Bandit Networks [0.0]
We introduce and evaluate novel Bandit algorithms designed for non-stationary environments. First, we present the Adaptive Discounted Thompson Sampling (ADTS) algorithm. We then extend this approach to the Portfolio Optimization problem by introducing the Combinatorial Adaptive Discounted Thompson Sampling (CADTS) algorithm.
arXiv Detail & Related papers (2024-10-05T16:17:31Z)
Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits [1.4732811715354452]
We consider a $K$-armed bandit problem in which each arm consumes a different amount of resources when selected. We propose a series of algorithms that are randomized like Thompson Sampling but more carefully optimize their decisions with respect to the budget constraint.
arXiv Detail & Related papers (2024-08-28T04:56:06Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed. SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Thompson Exploration with Best Challenger Rule in Best Arm Identification [66.33448474838342]
We study the fixed-confidence best arm identification problem in the bandit framework. We propose a novel policy that combines Thompson sampling with a computationally efficient approach known as the best challenger rule.
arXiv Detail & Related papers (2023-10-01T01:37:02Z)
Contextual Combinatorial Bandits with Probabilistically Triggered Arms [55.9237004478033]
We study contextual bandits with probabilistically triggered arms (C$2$MAB-T) under a variety of smoothness conditions. Under the triggering modulated (TPM) condition, we devise the C$2$-UC-T algorithm and derive a regret bound $tildeO(dsqrtT)$.
arXiv Detail & Related papers (2023-03-30T02:51:00Z)
Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints [13.069703665055084]
We propose a new recommendation algorithm for addressing the problem of two-sided online matching markets with complementary preferences and quota constraints. The presence of mixed quota and complementary preferences constraints can lead to instability in the matching process. We formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling algorithm.
arXiv Detail & Related papers (2023-01-24T18:54:29Z)
Thompson Sampling with Virtual Helping Agents [0.0]
We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits. We propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret.
arXiv Detail & Related papers (2022-09-16T23:34:44Z)
Batch-Size Independent Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms or Independent Arms [59.8188496313214]
We study the semi-bandits (CMAB) and focus on reducing the dependency of the batch-size $K$ in the regret bound. First, for the setting of CMAB with probabilistically triggered arms (CMAB-T), we propose a BCUCB-T algorithm with variance-aware confidence intervals. Second, for the setting of non-triggering CMAB with independent arms, we propose a SESCB algorithm which leverages on the non-triggering version of the TPVM condition.
arXiv Detail & Related papers (2022-08-31T13:09:39Z)
Algorithms for Adaptive Experiments that Trade-off Statistical Analysis with Reward: Combining Uniform Random Assignment and Reward Maximization [50.725191156128645]
Multi-armed bandit algorithms like Thompson Sampling can be used to conduct adaptive experiments. We present simulations for 2-arm experiments that explore two algorithms that combine the benefits of uniform randomization for statistical analysis.
arXiv Detail & Related papers (2021-12-15T22:11:58Z)
Bayesian decision-making under misspecified priors with applications to meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits. We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z)
A General Theory of the Stochastic Linear Bandit and Its Applications [8.071506311915398]
We introduce a general analysis framework and a family of algorithms for the linear bandit problem. Our new notion of optimism in expectation gives rise to a new algorithm, called sieved greedy (SG) that reduces the overexploration problem in OFUL. In addition to proving that SG is theoretically rate optimal, our empirical simulations show that SG outperforms existing benchmarks such as greedy, OFUL, and TS.
arXiv Detail & Related papers (2020-02-12T18:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.