Related papers: Practical Bandits: An Industry Perspective

Practical Bandits: An Industry Perspective

URL: http://arxiv.org/abs/2302.01223v1
Date: Thu, 2 Feb 2023 17:03:40 GMT
Title: Practical Bandits: An Industry Perspective
Authors: Bram van den Akker, Olivier Jeunen, Ying Li, Ben London, Zahra Nazari, Devesh Parekh
Abstract summary: The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. With the bandit lens comes the promise of direct optimisation for the metrics we care about. This tutorial will take a step towards filling that gap between the theory and practice of bandits.
Score: 7.682671667564167
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty. Because many business metrics can be viewed as rewards (a.k.a. utilities) that result from actions, bandit algorithms have seen a large and growing interest from industrial applications, such as search, recommendation and advertising. Indeed, with the bandit lens comes the promise of direct optimisation for the metrics we care about. Nevertheless, the road to successfully applying bandits in production is not an easy one. Even when the action space and rewards are well-defined, practitioners still need to make decisions regarding multi-arm or contextual approaches, on- or off-policy setups, delayed or immediate feedback, myopic or long-term optimisation, etc. To make matters worse, industrial platforms typically give rise to large action spaces in which existing approaches tend to break down. The research literature on these topics is broad and vast, but this can overwhelm practitioners, whose primary aim is to solve practical problems, and therefore need to decide on a specific instantiation or approach for each project. This tutorial will take a step towards filling that gap between the theory and practice of bandits. Our goal is to present a unified overview of the field and its existing terminology, concepts and algorithms -- with a focus on problems relevant to industry. We hope our industrial perspective will help future practitioners who wish to leverage the bandit paradigm for their application.

Related papers

A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond [88.5807076505261]
Large Reasoning Models (LRMs) have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference. A growing concern lies in their tendency to produce excessively long reasoning traces. This inefficiency introduces significant challenges for training, inference, and real-world deployment.
arXiv Detail & Related papers (2025-03-27T15:36:30Z)
Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration [44.601019677298005]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z)
SoK: Software Compartmentalization [3.058923790501231]
Decomposing large systems into smaller components has long been recognized as an effective means to minimize the impact of exploits. Despite historical roots, demonstrated benefits, and a plethora of research efforts in academia and industry, the compartmentalization of software is still not a mainstream practice. We propose a unified model for the systematic analysis, comparison, and directing of compartmentalization approaches.
arXiv Detail & Related papers (2024-10-11T00:38:45Z)
Incentivized Learning in Principal-Agent Bandit Games [62.41639598376539]
This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent. The principal can influence the agent's decisions by offering incentives which add up to his rewards. We present nearly optimal learning algorithms for the principal's regret in both multi-armed and linear contextual settings.
arXiv Detail & Related papers (2024-03-06T16:00:46Z)
Learning Machine Morality through Experience and Interaction [3.7414804164475983]
Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents. We argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents.
arXiv Detail & Related papers (2023-12-04T11:46:34Z)
Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda for Developing Practical Guidelines and Tools [18.513353100744823]
Recent work has called on the ML community to take a more holistic approach to tackle fairness issues. We first demonstrate that without clear guidelines and toolkits, even individuals with specialized ML knowledge find it challenging to hypothesize how various design choices influence model behavior. We then consult the fair-ML literature to understand the progress to date toward operationalizing the pipeline-aware approach.
arXiv Detail & Related papers (2023-09-29T15:48:26Z)
Bandit Social Learning: Exploration under Myopic Behavior [54.767961587919075]
We study social learning dynamics motivated by reviews on online platforms. Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z)
Deflectometry for specular surfaces: an overview [0.0]
Deflectometry as a technical approach to assessing reflective surfaces has now existed for almost 40 years. Different aspects and variations of the method have been studied in multiple theses and research articles, and reviews are also becoming available for certain subtopics.
arXiv Detail & Related papers (2022-04-10T22:17:47Z)
A Framework for Fairness: A Systematic Review of Existing Fair AI Solutions [4.594159253008448]
A large portion of fairness research has gone to producing tools that machine learning practitioners can use to audit for bias while designing their algorithms. There is a lack of application of these fairness solutions in practice. This review provides an in-depth summary of the algorithmic bias issues that have been defined and the fairness solution space that has been proposed.
arXiv Detail & Related papers (2021-12-10T17:51:20Z)
Comparing Heuristics, Constraint Optimization, and Reinforcement Learning for an Industrial 2D Packing Problem [58.720142291102135]
Cutting and Packing problems are occurring in different industries with a direct impact on the revenue of businesses. Machine learning is increasingly used for solving such problems.
arXiv Detail & Related papers (2021-10-27T15:47:47Z)
Inspect, Understand, Overcome: A Survey of Practical Methods for AI Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings. In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged. Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z)
Combinatorial Pure Exploration with Full-bandit Feedback and Beyond: Solving Combinatorial Optimization under Uncertainty with Limited Observation [70.41056265629815]
When developing an algorithm for optimization, it is commonly assumed that parameters such as edge weights are exactly known as inputs. In this article, we review recently proposed techniques for pure exploration problems with limited feedback.
arXiv Detail & Related papers (2020-12-31T12:40:52Z)
Forecasting: theory and practice [65.71277206849244]
This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts.
arXiv Detail & Related papers (2020-12-04T16:56:44Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
Algorithmic Fairness from a Non-ideal Perspective [26.13086713244309]
We argue that the increasingly apparent shortcomings of proposed fair machine learning algorithms reflect broader troubles faced by the ideal approach. We conclude with a critical discussion of the harms of misguided solutions, a reinterpretation of impossibility results, and directions for future research.
arXiv Detail & Related papers (2020-01-08T18:44:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.