Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning
- URL: http://arxiv.org/abs/2512.24959v1
- Date: Wed, 31 Dec 2025 16:42:00 GMT
- Title: Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning
- Authors: András Antos, András Millinghoffer, Péter Antal,
- Abstract summary: A new framework, Sequential Support Network Learning, can be used to learn a support network from sparse candidate lists efficiently.<n>We develop a new pure-exploration model, the semi-overlapping multi-(multi-armed) bandit (SOMMAB), in which a single evaluation provides distinct feedback to multiple bandits.<n>We derive new exponential error bounds that improve the best known constant in the exponent for multi-bandit best-arm identification.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many modern AI and ML problems require evaluating partners' contributions through shared yet asymmetric, computationally intensive processes and the simultaneous selection of the most beneficial candidates. Sequential approaches to these problems can be unified under a new framework, Sequential Support Network Learning (SSNL), in which the goal is to select the most beneficial candidate set of partners for all participants using trials; that is, to learn a directed graph that represents the highest-performing contributions. We demonstrate that a new pure-exploration model, the semi-overlapping multi-(multi-armed) bandit (SOMMAB), in which a single evaluation provides distinct feedback to multiple bandits due to structural overlap among their arms, can be used to learn a support network from sparse candidate lists efficiently. We develop a generalized GapE algorithm for SOMMABs and derive new exponential error bounds that improve the best known constant in the exponent for multi-bandit best-arm identification. The bounds scale linearly with the degree of overlap, revealing significant sample-complexity gains arising from shared evaluations. From an application point of view, this work provides a theoretical foundation and improved performance guarantees for sequential learning tools for identifying support networks from sparse candidates in multiple learning problems, such as in multi-task learning (MTL), auxiliary task learning (ATL), federated learning (FL), and in multi-agent systems (MAS).
Related papers
- BandiK: Efficient Multi-Task Decomposition Using a Multi-Bandit Framework [0.05142666700569701]
BandiK is a novel multi-task auxiliary task subset selection method using multi-bandits.<n>It estimates the pairwise transfers between tasks, which helps in identifying which tasks are likely to benefit from joint learning.<n>In the second stage, it constructs a linear number of candidate sets of auxiliary tasks for each target task based on the initial estimations.
arXiv Detail & Related papers (2025-12-31T08:25:15Z) - Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs [102.48588475875749]
We introduce Generative Self-Refinement (GSR), a novel parallel test-time scaling framework.<n>GSR generates a set of candidate responses in parallel and then performs self-refinement to synthesize a new superior solution.<n>We show that our method achieves state-of-the-art performance across five mathematical benchmarks.
arXiv Detail & Related papers (2025-08-27T06:51:48Z) - Learning Representation for Multitask learning through Self Supervised Auxiliary learning [3.236198583140341]
In the hard parameter sharing approach, an encoder shared through multiple tasks generates data representations passed to task-specific predictors.
We propose Dummy Gradient norm Regularization that aims to improve the universality of the representations generated by the shared encoder.
We show that DGR effectively improves the quality of the shared representations, leading to better multi-task prediction performances.
arXiv Detail & Related papers (2024-09-25T06:08:35Z) - POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding [76.67608003501479]
We introduce POGEMA, a comprehensive set of tools that includes a fast environment for learning, a problem instance generator, and a visualization toolkit.<n>We also introduce and define an evaluation protocol that specifies a range of domain-related metrics, computed based on primary evaluation indicators.<n>The results of this comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z) - Variational Offline Multi-agent Skill Discovery [47.924414207796005]
We propose two novel auto-encoder schemes to simultaneously capture subgroup- and temporal-level abstractions and form multi-agent skills.<n>Our method can be applied to offline multi-task data, and the discovered subgroup skills can be transferred across relevant tasks without retraining.<n> Empirical evaluations on StarCraft tasks indicate that our approach significantly outperforms existing hierarchical multi-agent reinforcement learning (MARL) methods.
arXiv Detail & Related papers (2024-05-26T00:24:46Z) - Provable Benefits of Multi-task RL under Non-Markovian Decision Making
Processes [56.714690083118406]
In multi-task reinforcement learning (RL) under Markov decision processes (MDPs), the presence of shared latent structures has been shown to yield significant benefits to the sample efficiency compared to single-task RL.
We investigate whether such a benefit can extend to more general sequential decision making problems, such as partially observable MDPs (POMDPs) and more general predictive state representations (PSRs)
We propose a provably efficient algorithm UMT-PSR for finding near-optimal policies for all PSRs, and demonstrate that the advantage of multi-task learning manifests if the joint model class of PSR
arXiv Detail & Related papers (2023-10-20T14:50:28Z) - Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning [9.571499333904969]
Auxiliary tasks facilitate learning in situations where data is scarce or the principal task of interest is extremely complex.<n>We propose a novel framework, dubbed Detaux, whereby a weakly supervised disentanglement procedure is used to discover a new unrelated auxiliary classification task.<n>The disentanglement procedure works at the representation level, isolating the variation related to the principal task into an isolated subspace.
arXiv Detail & Related papers (2023-10-13T17:40:39Z) - Synergies between Disentanglement and Sparsity: Generalization and
Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations.
Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - MALib: A Parallel Framework for Population-based Multi-agent
Reinforcement Learning [61.28547338576706]
Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms.
We present MALib, a scalable and efficient computing framework for PB-MARL.
arXiv Detail & Related papers (2021-06-05T03:27:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.