Related papers: LOFA: Online Influence Maximization under Full-Bandit Feedback using Lazy Forward Selection

LOFA: Online Influence Maximization under Full-Bandit Feedback using Lazy Forward Selection

URL: http://arxiv.org/abs/2601.00933v1
Date: Fri, 02 Jan 2026 08:00:14 GMT
Title: LOFA: Online Influence Maximization under Full-Bandit Feedback using Lazy Forward Selection
Authors: Jinyu Xu, Abhishek K. Umrawal,
Abstract summary: We exploit the problem of influence (L-OFA) in an online setting, where the goal is to select a subset of nodes.<n>We conduct experiments to demonstrate algorithms that achieves superior performance compared to existing algorithms.
Score: 2.2942509826099107
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We study the problem of influence maximization (IM) in an online setting, where the goal is to select a subset of nodes$\unicode{x2014}$called the seed set$\unicode{x2014}$at each time step over a fixed time horizon, subject to a cardinality budget constraint, to maximize the expected cumulative influence. We operate under a full-bandit feedback model, where only the influence of the chosen seed set at each time step is observed, with no additional structural information about the network or diffusion process. It is well-established that the influence function is submodular, and existing algorithms exploit this property to achieve low regret. In this work, we leverage this property further and propose the Lazy Online Forward Algorithm (LOFA), which achieves a lower empirical regret. We conduct experiments on a real-world social network to demonstrate that LOFA achieves superior performance compared to existing bandit algorithms in terms of cumulative regret and instantaneous reward.

Related papers

Adapting to Stochastic and Adversarial Losses in Episodic MDPs with Aggregate Bandit Feedback [61.49239204705301]
We study online learning in finite-horizon episodic Markov decision processes (MDPs) under the challenging aggregate bandit feedback model.<n>Our results rely on a combination of FTRL over occupancy measures, self-bounding techniques, and new loss estimators inspired by recent advances in online shortest path problems.
arXiv Detail & Related papers (2025-10-20T02:28:08Z)
Fair Algorithms with Probing for Multi-Agent Multi-Armed Bandits [15.700062892888084]
We introduce a novel probing framework that strategically gathers information about selected arms before allocation.<n>In the offline setting, where reward distributions are known, we leverage submodular properties to design a greedy probing algorithm with a provable performance bound.<n>For the more complex online setting, we develop an algorithm that achieves sublinear regret while maintaining fairness.
arXiv Detail & Related papers (2025-06-17T21:43:21Z)
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback [38.61232011566285]
We study the recently proposed model of RL with Aggregate Bandit Feedback (RL-ABF), where the agent only observes the sum of rewards at the end of an episode instead of each reward individually. In this paper, we extend ABF to linear function approximation and develop two efficient algorithms with near-optimal regret guarantees.
arXiv Detail & Related papers (2024-05-13T10:51:01Z)
Submodular Reinforcement Learning [38.40138241424851]
In reinforcement learning (RL), rewards of states are typically considered additive, and following the Markov assumption, they are $textitindependent$ states visited previously. In many important applications, such as coverage control, experiment design and informative path planning, rewards naturally have diminishing returns, i.e., their value decreases in light of similar states visited previously. We propose $textitsubmodular RL$ (SubRL), a paradigm which seeks to optimize more general, non-additive (and history-dependent) rewards modelled via submodular set functions which capture diminishing returns
arXiv Detail & Related papers (2023-07-25T09:46:02Z)
Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards [0.4194295877935867]
In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays. We propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards. We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem.
arXiv Detail & Related papers (2023-03-01T16:22:22Z)
Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation [69.0695698566235]
We study reinforcement learning with linear function approximation and adversarially changing cost functions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback.
arXiv Detail & Related papers (2023-01-30T17:26:39Z)
Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization [53.11458949694947]
We consider an adaptive version of content-dependent online influence problem where seed nodes are sequentially activated based on realtime feedback. Our algorithm maintains a network model estimate and selects seed adaptively, exploring the social network while improving the optimal policy optimistically.
arXiv Detail & Related papers (2022-06-29T18:17:28Z)
Online Nonsubmodular Minimization with Delayed Costs: From Full Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms. We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z)
Where is the Grass Greener? Revisiting Generalized Policy Iteration for Offline Reinforcement Learning [81.15016852963676]
We re-implement state-of-the-art baselines in the offline RL regime under a fair, unified, and highly factorized framework. We show that when a given baseline outperforms its competing counterparts on one end of the spectrum, it never does on the other end.
arXiv Detail & Related papers (2021-07-03T11:00:56Z)
Lenient Regret for Multi-Armed Bandits [72.56064196252498]
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. We suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some $epsilon$.
arXiv Detail & Related papers (2020-08-10T08:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.