Online Learning with Cumulative Oversampling: Application to Budgeted
Influence Maximization
- URL: http://arxiv.org/abs/2004.11963v3
- Date: Wed, 16 Sep 2020 01:51:09 GMT
- Title: Online Learning with Cumulative Oversampling: Application to Budgeted
Influence Maximization
- Authors: Shatian Wang, Shuoguang Yang, Zhen Xu, Van-Anh Truong
- Abstract summary: We propose a cumulative over (CO) method for online learning.
Our key idea is to sample parameter estimations from the updated belief space once in each round.
We show that for IM semi-bandits, our CO-based algorithm achieves a scaled regret comparable to that of the UCB-based algorithms in theory.
- Score: 7.893654261799925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a cumulative oversampling (CO) method for online learning. Our key
idea is to sample parameter estimations from the updated belief space once in
each round (similar to Thompson Sampling), and utilize the cumulative samples
up to the current round to construct optimistic parameter estimations that
asymptotically concentrate around the true parameters as tighter upper
confidence bounds compared to the ones constructed with standard UCB methods.
We apply CO to a novel budgeted variant of the Influence Maximization (IM)
semi-bandits with linear generalization of edge weights, whose offline problem
is NP-hard. Combining CO with the oracle we design for the offline problem, our
online learning algorithm simultaneously tackles budget allocation, parameter
learning, and reward maximization. We show that for IM semi-bandits, our
CO-based algorithm achieves a scaled regret comparable to that of the UCB-based
algorithms in theory, and performs on par with Thompson Sampling in numerical
experiments.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Multivariate Probabilistic CRPS Learning with an Application to
Day-Ahead Electricity Prices [0.0]
This paper presents a new method for combining (or aggregating or ensembling) multivariate probabilistic forecasts.
It considers dependencies between quantiles and marginals through a smoothing procedure that allows for online learning.
A fast C++ implementation of the proposed algorithm is provided in the open-source R-Package profoc on CRAN.
arXiv Detail & Related papers (2023-03-17T14:47:55Z) - Sparse high-dimensional linear regression with a partitioned empirical
Bayes ECM algorithm [62.997667081978825]
We propose a computationally efficient and powerful Bayesian approach for sparse high-dimensional linear regression.
Minimal prior assumptions on the parameters are used through the use of plug-in empirical Bayes estimates.
The proposed approach is implemented in the R package probe.
arXiv Detail & Related papers (2022-09-16T19:15:50Z) - A Stochastic Bundle Method for Interpolating Networks [18.313879914379008]
We propose a novel method for training deep neural networks that are capable of driving the empirical loss to zero.
At each iteration our method constructs a maximum linear approximation, known as the bundle of the objective learning approximation.
arXiv Detail & Related papers (2022-01-29T23:02:30Z) - Adaptive Client Sampling in Federated Learning via Online Learning with
Bandit Feedback [36.05851452151107]
federated learning (FL) systems need to sample a subset of clients that are involved in each round of training.
Despite its importance, there is limited work on how to sample clients effectively.
We show how our sampling method can improve the convergence speed of optimization algorithms.
arXiv Detail & Related papers (2021-12-28T23:50:52Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Adaptive Importance Sampling for Finite-Sum Optimization and Sampling
with Decreasing Step-Sizes [4.355567556995855]
We propose Avare, a simple and efficient algorithm for adaptive importance sampling for finite-sum optimization and sampling with decreasing step-sizes.
Under standard technical conditions, we show that Avare achieves $mathcalO(T2/3)$ and $mathcalO(T5/6)$ dynamic regret for SGD and SGLD respectively when run with $mathcalO(T5/6)$ step sizes.
arXiv Detail & Related papers (2021-03-23T00:28:15Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.