Related papers: Multi-Task Combinatorial Bandits for Budget Allocation

Multi-Task Combinatorial Bandits for Budget Allocation

URL: http://arxiv.org/abs/2409.00561v1
Date: Sat, 31 Aug 2024 23:19:49 GMT
Title: Multi-Task Combinatorial Bandits for Budget Allocation
Authors: Lin Ge, Yang Xu, Jianing Chu, David Cramer, Fuhong Li, Kelly Paulson, Rui Song,
Abstract summary: Today's top advertisers typically manage hundreds of campaigns simultaneously and consistently launch new ones throughout the year. A crucial challenge for marketing managers is determining the optimal allocation of limited budgets across various ad lines in each campaign to maximize cumulative returns. We propose to formulate budget allocation as a multi-task bandit problem and introduce a novel online budget allocation system.
Score: 7.52750519688457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Today's top advertisers typically manage hundreds of campaigns simultaneously and consistently launch new ones throughout the year. A crucial challenge for marketing managers is determining the optimal allocation of limited budgets across various ad lines in each campaign to maximize cumulative returns, especially given the huge uncertainty in return outcomes. In this paper, we propose to formulate budget allocation as a multi-task combinatorial bandit problem and introduce a novel online budget allocation system. The proposed system: i) integrates a Bayesian hierarchical model to intelligently utilize the metadata of campaigns and ad lines and budget size, ensuring efficient information sharing; ii) provides the flexibility to incorporate diverse modeling techniques such as Linear Regression, Gaussian Processes, and Neural Networks, catering to diverse environmental complexities; and iii) employs the Thompson sampling (TS) technique to strike a balance between exploration and exploitation. Through offline evaluation and online experiments, our system demonstrates robustness and adaptability, effectively maximizing the overall cumulative returns. A Python implementation of the proposed procedure is available at https://anonymous.4open.science/r/MCMAB.

Related papers

Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update [60.414548453838506]
We study the generalized linear bandit (GLB) problem, a contextual multi-armed bandit framework that extends the classical linear model by incorporating a non-linear link function.<n>GLBs are widely applicable to real-world scenarios, but their non-linear nature introduces significant challenges in achieving both computational and statistical efficiency.<n>We propose a jointly efficient algorithm that attains a nearly optimal regret bound with $mathcalO(1)$ time and space complexities per round.
arXiv Detail & Related papers (2025-07-16T02:24:21Z)
Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems [54.709976343045824]
Current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios.<n>We propose MTORL, a novel multi-task offline RL model that targets two key objectives.<n>We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation.
arXiv Detail & Related papers (2025-06-29T05:05:13Z)
Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits [15.700062892888084]
We introduce a novel probing framework that strategically gathers information about selected arms before allocation.<n>In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound.<n>For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness.
arXiv Detail & Related papers (2025-06-17T21:43:21Z)
Hidden Representation Clustering with Multi-Task Representation Learning towards Robust Online Budget Allocation [25.524699372749957]
Marketing optimization, commonly formulated as an online budget allocation problem, has emerged as a pivotal factor in driving user growth.<n>This paper proposes a novel approach that solves the problem from the cluster perspective.
arXiv Detail & Related papers (2025-06-01T11:09:07Z)
Adaptive Budget Optimization for Multichannel Advertising Using Combinatorial Bandits [9.197038204851458]
We introduce three key contributions to the field of budget allocation in digital advertising. First, we develop a simulation environment designed to mimic multichannel advertising campaigns over extended time horizons. Second, we propose an enhanced bandit budget allocation strategy that leverages a saturating mean function and a targeted exploration mechanism with change-point detection.
arXiv Detail & Related papers (2025-02-05T06:29:52Z)
Multi-modal Retrieval Augmented Multi-modal Generation: Datasets, Evaluation Metrics and Strong Baselines [64.61315565501681]
Multi-modal Retrieval Augmented Multi-modal Generation (M$2$RAG) is a novel task that enables foundation models to process multi-modal web content. Despite its potential impact, M$2$RAG remains understudied, lacking comprehensive analysis and high-quality data resources.
arXiv Detail & Related papers (2024-11-25T13:20:19Z)
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for lifelong instruction tuning. We construct pseudo-skill clusters by grouping gradient-based sample vectors. We select the best-performing data selector for each skill cluster from a pool of selector experts. This data selector samples a subset of the most important samples from each skill cluster for training.
arXiv Detail & Related papers (2024-10-14T15:48:09Z)
Improving Portfolio Optimization Results with Bandit Networks [0.0]
We introduce and evaluate novel Bandit algorithms designed for non-stationary environments. First, we present the Adaptive Discounted Thompson Sampling (ADTS) algorithm. We then extend this approach to the Portfolio Optimization problem by introducing the Combinatorial Adaptive Discounted Thompson Sampling (CADTS) algorithm.
arXiv Detail & Related papers (2024-10-05T16:17:31Z)
A Federated Online Restless Bandit Framework for Cooperative Resource Allocation [23.698976872351576]
We study the cooperative resource allocation problem with unknown system dynamics of MRPs. We put forth a Federated Thompson-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. Numerical results show that the proposed algorithm achieves a fast convergence rate of $mathcalO(sqrtTlog(T))$ and better performance compared with baselines.
arXiv Detail & Related papers (2024-06-12T08:34:53Z)
Multimodal Instruction Tuning with Conditional Mixture of LoRA [54.65520214291653]
This paper introduces a novel approach that integrates multimodal instruction tuning with Low-Rank Adaption (LoRA) It innovates upon LoRA by dynamically constructing low-rank adaptation matrices tailored to the unique demands of each input instance. Experimental results on various multimodal evaluation datasets indicate that MixLoRA not only outperforms the conventional LoRA with the same or even higher ranks.
arXiv Detail & Related papers (2024-02-24T20:15:31Z)
A Bayesian Framework of Deep Reinforcement Learning for Joint O-RAN/MEC Orchestration [12.914011030970814]
Multi-access Edge Computing (MEC) can be implemented together with Open Radio Access Network (O-RAN) over commodity platforms to offer low-cost deployment. In this paper, a joint O-RAN/MEC orchestration using a Bayesian deep reinforcement learning (RL)-based framework is proposed.
arXiv Detail & Related papers (2023-12-26T18:04:49Z)
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning [156.5667417159582]
This paper proposes a novel model of sequential information design, namely the Markov persuasion processes (MPPs) Planning in MPPs faces the unique challenge in finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. We design a provably efficient no-regret learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles.
arXiv Detail & Related papers (2022-02-22T05:41:43Z)
LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm [11.82503645248441]
How to select the right amount of incentives (i.e. treatment) to each user under budget constraints is an important research problem. We propose a novel tree-based treatment selection technique under budget constraints, called Large-Scale Budget-Constrained Causal Forest (LBCF) algorithm. We deploy our approach in a real-world scenario on a large-scale video platform, where the platform gives away bonuses in order to increase users' campaign engagement duration.
arXiv Detail & Related papers (2022-01-29T13:21:07Z)
Contextual Bandits for Advertising Campaigns: A Diffusion-Model Independent Approach (Extended Version) [73.59962178534361]
We study an influence problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate. In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaign. We describe and compare two methods of contextual multi-armed bandits, with upper-confidence bounds on the remaining potential of influencers.
arXiv Detail & Related papers (2022-01-13T22:06:10Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
The Best of Many Worlds: Dual Mirror Descent for Online Allocation Problems [7.433931244705934]
We consider a data-driven setting in which the reward and resource consumption of each request are generated using an input model unknown to the decision maker. We design general class of algorithms that attain good performance in various input models without knowing which type of input they are facing. Our algorithms operate in the Lagrangian dual space: they maintain a dual multiplier for each resource that is updated using online mirror descent.
arXiv Detail & Related papers (2020-11-18T18:39:17Z)
Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance. We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z)
MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding [47.555870679348416]
We propose a Multi-ecTive Actor-Critics algorithm named MoTiAC for the problem of bidding optimization with various goals. Unlike previous RL models, the proposed MoTiAC can simultaneously fulfill multi-objective tasks in complicated bidding environments.
arXiv Detail & Related papers (2020-02-18T07:16:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.