Related papers: Advertising Media and Target Audience Optimization via High-dimensional Bandits

Advertising Media and Target Audience Optimization via High-dimensional Bandits

URL: http://arxiv.org/abs/2209.08403v1
Date: Sat, 17 Sep 2022 21:00:53 GMT
Title: Advertising Media and Target Audience Optimization via High-dimensional Bandits
Authors: Wenjia Ba, J. Michael Harrison, Harikesh S. Nair
Abstract summary: We present a data-driven algorithm that advertisers can use to automate their digital ad-campaigns at online publishers. The algorithm enables the advertiser to search across available target audiences and ad-media to find the best possible combination for its campaign via online experimentation.
Score: 2.5137859989323537
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a data-driven algorithm that advertisers can use to automate their digital ad-campaigns at online publishers. The algorithm enables the advertiser to search across available target audiences and ad-media to find the best possible combination for its campaign via online experimentation. The problem of finding the best audience-ad combination is complicated by a number of distinctive challenges, including (a) a need for active exploration to resolve prior uncertainty and to speed the search for profitable combinations, (b) many combinations to choose from, giving rise to high-dimensional search formulations, and (c) very low success probabilities, typically just a fraction of one percent. Our algorithm (designated LRDL, an acronym for Logistic Regression with Debiased Lasso) addresses these challenges by combining four elements: a multiarmed bandit framework for active exploration; a Lasso penalty function to handle high dimensionality; an inbuilt debiasing kernel that handles the regularization bias induced by the Lasso; and a semi-parametric regression model for outcomes that promotes cross-learning across arms. The algorithm is implemented as a Thompson Sampler, and to the best of our knowledge, it is the first that can practically address all of the challenges above. Simulations with real and synthetic data show the method is effective and document its superior performance against several benchmarks from the recent high-dimensional bandit literature.

Related papers

Learning to Rank for Multiple Retrieval-Augmented Models through Iterative Utility Maximization [21.115495457454365]
This paper investigates the design of a unified search engine to serve multiple retrieval-augmented generation (RAG) agents. We introduce an iterative approach where the search engine generates retrieval results for these RAG agents and gathers feedback on the quality of the retrieved documents during an offline phase. We adapt this approach to an online setting, allowing the search engine to refine its behavior based on real-time individual agents feedback.
arXiv Detail & Related papers (2024-10-13T17:53:50Z)
LOLA: LLM-Assisted Online Learning Algorithm for Content Experiments [2.2021543101231167]
Modern media firms require automated and efficient methods to identify content that is most engaging and appealing to users. We first investigate the ability of three pure-LLM approaches to identify the catchiest headline: prompt-based methods, embedding-based methods, and fine-tuned open-source LLMs. We then introduce the LLM-Assisted Online Learning Algorithm (LOLA), a novel framework that integrates Large Language Models (LLMs) with adaptive experimentation to optimize content delivery.
arXiv Detail & Related papers (2024-06-03T07:56:58Z)
Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms. In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
A Fast Algorithm for the Real-Valued Combinatorial Pure Exploration of Multi-Armed Bandit [55.2480439325792]
We study the real-valued pure exploration problem in the multi-armed bandit (R-CPE-MAB) We introduce an algorithm named the gap-based exploration (CombGapE) algorithm, whose sample complexity upper bound matches the lower bound up to a problem-dependent constant factor. We numerically show that the CombGapE algorithm outperforms existing methods significantly in both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-06-15T15:37:31Z)
Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment. Standard sampling-based algorithms can be used to search for solutions in the robots' joint space. We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z)
Incentivizing Combinatorial Bandit Exploration [87.08827496301839]
Consider a bandit algorithm that recommends actions to self-interested users in a recommendation system. Users are free to choose other actions and need to be incentivized to follow the algorithm's recommendations. While the users prefer to exploit, the algorithm can incentivize them to explore by leveraging the information collected from the previous users.
arXiv Detail & Related papers (2022-06-01T13:46:25Z)
Distributed Dynamic Safe Screening Algorithms for Sparse Regularization [73.85961005970222]
We propose a new distributed dynamic safe screening (DDSS) method for sparsity regularized models and apply it on shared-memory and distributed-memory architecture respectively. We prove that the proposed method achieves the linear convergence rate with lower overall complexity and can eliminate almost all the inactive features in a finite number of iterations almost surely.
arXiv Detail & Related papers (2022-04-23T02:45:55Z)
Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation [20.42993976179691]
We propose a novel method to extend ANN search to arbitrary matching functions. Our main idea is to perform a greedy walk with a matching function in a similarity graph constructed from all items. The proposed method has been fully deployed in the Taobao display advertising platform and brings a considerable advertising revenue increase.
arXiv Detail & Related papers (2022-02-14T07:55:57Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Doubly-Adaptive Thompson Sampling for Multi-Armed and Contextual Bandits [28.504921333436833]
We propose a variant of a Thompson sampling based algorithm that adaptively re-weighs the terms of a doubly robust estimator on the true mean reward of each arm. The proposed algorithm matches the optimal (minimax) regret rate and its empirical evaluation in a semi-synthetic experiment. We extend this approach to contextual bandits, where there are more sources of bias present apart from the adaptive data collection.
arXiv Detail & Related papers (2021-02-25T22:29:25Z)
Optimizing Offer Sets in Sub-Linear Time [5.027714423258537]
We propose an algorithm for personalized offer set optimization that runs in time sub-linear in the number of items. Our algorithm can be entirely data-driven, relying on samples of the user, where a sample' refers to the user interaction data typically collected by firms.
arXiv Detail & Related papers (2020-11-17T13:02:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.