Practical Bandits: An Industry Perspective
- URL: http://arxiv.org/abs/2302.01223v1
- Date: Thu, 2 Feb 2023 17:03:40 GMT
- Title: Practical Bandits: An Industry Perspective
- Authors: Bram van den Akker, Olivier Jeunen, Ying Li, Ben London, Zahra Nazari,
Devesh Parekh
- Abstract summary: The bandit paradigm provides a unified modeling framework for problems that require decision-making under uncertainty.
With the bandit lens comes the promise of direct optimisation for the metrics we care about.
This tutorial will take a step towards filling that gap between the theory and practice of bandits.
- Score: 7.682671667564167
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The bandit paradigm provides a unified modeling framework for problems that
require decision-making under uncertainty. Because many business metrics can be
viewed as rewards (a.k.a. utilities) that result from actions, bandit
algorithms have seen a large and growing interest from industrial applications,
such as search, recommendation and advertising. Indeed, with the bandit lens
comes the promise of direct optimisation for the metrics we care about.
Nevertheless, the road to successfully applying bandits in production is not
an easy one. Even when the action space and rewards are well-defined,
practitioners still need to make decisions regarding multi-arm or contextual
approaches, on- or off-policy setups, delayed or immediate feedback, myopic or
long-term optimisation, etc. To make matters worse, industrial platforms
typically give rise to large action spaces in which existing approaches tend to
break down. The research literature on these topics is broad and vast, but this
can overwhelm practitioners, whose primary aim is to solve practical problems,
and therefore need to decide on a specific instantiation or approach for each
project. This tutorial will take a step towards filling that gap between the
theory and practice of bandits. Our goal is to present a unified overview of
the field and its existing terminology, concepts and algorithms -- with a focus
on problems relevant to industry. We hope our industrial perspective will help
future practitioners who wish to leverage the bandit paradigm for their
application.
Related papers
- A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond [88.5807076505261]
Large Reasoning Models (LRMs) have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference.
A growing concern lies in their tendency to produce excessively long reasoning traces.
This inefficiency introduces significant challenges for training, inference, and real-world deployment.
arXiv Detail & Related papers (2025-03-27T15:36:30Z) - Towards Unsupervised Multi-Agent Reinforcement Learning via Task-Agnostic Exploration [44.601019677298005]
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.<n>We show that optimizing for a specific objective, namely mixture entropy, provides an excellent trade-off between tractability and performances.
arXiv Detail & Related papers (2025-02-12T12:51:36Z) - SoK: Software Compartmentalization [3.058923790501231]
Decomposing large systems into smaller components has long been recognized as an effective means to minimize the impact of exploits.
Despite historical roots, demonstrated benefits, and a plethora of research efforts in academia and industry, the compartmentalization of software is still not a mainstream practice.
We propose a unified model for the systematic analysis, comparison, and directing of compartmentalization approaches.
arXiv Detail & Related papers (2024-10-11T00:38:45Z) - Incentivized Learning in Principal-Agent Bandit Games [62.41639598376539]
This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.
The principal can influence the agent's decisions by offering incentives which add up to his rewards.
We present nearly optimal learning algorithms for the principal's regret in both multi-armed and linear contextual settings.
arXiv Detail & Related papers (2024-03-06T16:00:46Z) - Learning Machine Morality through Experience and Interaction [3.7414804164475983]
Increasing interest in ensuring safety of next-generation Artificial Intelligence (AI) systems calls for novel approaches to embedding morality into autonomous agents.
We argue that more hybrid solutions are needed to create adaptable and robust, yet more controllable and interpretable agents.
arXiv Detail & Related papers (2023-12-04T11:46:34Z) - Toward Operationalizing Pipeline-aware ML Fairness: A Research Agenda
for Developing Practical Guidelines and Tools [18.513353100744823]
Recent work has called on the ML community to take a more holistic approach to tackle fairness issues.
We first demonstrate that without clear guidelines and toolkits, even individuals with specialized ML knowledge find it challenging to hypothesize how various design choices influence model behavior.
We then consult the fair-ML literature to understand the progress to date toward operationalizing the pipeline-aware approach.
arXiv Detail & Related papers (2023-09-29T15:48:26Z) - Bandit Social Learning: Exploration under Myopic Behavior [54.767961587919075]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Deflectometry for specular surfaces: an overview [0.0]
Deflectometry as a technical approach to assessing reflective surfaces has now existed for almost 40 years.
Different aspects and variations of the method have been studied in multiple theses and research articles, and reviews are also becoming available for certain subtopics.
arXiv Detail & Related papers (2022-04-10T22:17:47Z) - A Framework for Fairness: A Systematic Review of Existing Fair AI
Solutions [4.594159253008448]
A large portion of fairness research has gone to producing tools that machine learning practitioners can use to audit for bias while designing their algorithms.
There is a lack of application of these fairness solutions in practice.
This review provides an in-depth summary of the algorithmic bias issues that have been defined and the fairness solution space that has been proposed.
arXiv Detail & Related papers (2021-12-10T17:51:20Z) - Comparing Heuristics, Constraint Optimization, and Reinforcement
Learning for an Industrial 2D Packing Problem [58.720142291102135]
Cutting and Packing problems are occurring in different industries with a direct impact on the revenue of businesses.
Machine learning is increasingly used for solving such problems.
arXiv Detail & Related papers (2021-10-27T15:47:47Z) - Inspect, Understand, Overcome: A Survey of Practical Methods for AI
Safety [54.478842696269304]
The use of deep neural networks (DNNs) in safety-critical applications is challenging due to numerous model-inherent shortcomings.
In recent years, a zoo of state-of-the-art techniques aiming to address these safety concerns has emerged.
Our paper addresses both machine learning experts and safety engineers.
arXiv Detail & Related papers (2021-04-29T09:54:54Z) - Combinatorial Pure Exploration with Full-bandit Feedback and Beyond:
Solving Combinatorial Optimization under Uncertainty with Limited Observation [70.41056265629815]
When developing an algorithm for optimization, it is commonly assumed that parameters such as edge weights are exactly known as inputs.
In this article, we review recently proposed techniques for pure exploration problems with limited feedback.
arXiv Detail & Related papers (2020-12-31T12:40:52Z) - Forecasting: theory and practice [65.71277206849244]
This article provides a non-systematic review of the theory and the practice of forecasting.
We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches.
We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts.
arXiv Detail & Related papers (2020-12-04T16:56:44Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z) - Algorithmic Fairness from a Non-ideal Perspective [26.13086713244309]
We argue that the increasingly apparent shortcomings of proposed fair machine learning algorithms reflect broader troubles faced by the ideal approach.
We conclude with a critical discussion of the harms of misguided solutions, a reinterpretation of impossibility results, and directions for future research.
arXiv Detail & Related papers (2020-01-08T18:44:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.