Energy Regularized RNNs for Solving Non-Stationary Bandit Problems
- URL: http://arxiv.org/abs/2303.06552v2
- Date: Tue, 28 Mar 2023 15:20:41 GMT
- Title: Energy Regularized RNNs for Solving Non-Stationary Bandit Problems
- Authors: Michael Rotman, Lior Wolf
- Abstract summary: We present an energy term that prevents the neural network from becoming too confident in support of a certain action.
We demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits.
- Score: 97.72614340294547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a Multi-Armed Bandit problem in which the rewards are
non-stationary and are dependent on past actions and potentially on past
contexts. At the heart of our method, we employ a recurrent neural network,
which models these sequences. In order to balance between exploration and
exploitation, we present an energy minimization term that prevents the neural
network from becoming too confident in support of a certain action. This term
provably limits the gap between the maximal and minimal probabilities assigned
by the network. In a diverse set of experiments, we demonstrate that our method
is at least as effective as methods suggested to solve the sub-problem of
Rotting Bandits, and can solve intuitive extensions of various benchmark
problems. We share our implementation at
https://github.com/rotmanmi/Energy-Regularized-RNN.
Related papers
- Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network.
We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems.
We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z) - Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems [2.6695224599322214]
We provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems.
We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
arXiv Detail & Related papers (2023-09-21T14:48:02Z) - Tighter Abstract Queries in Neural Network Verification [0.0]
We present CEGARETTE, a novel verification mechanism where both the system and the property are abstracted and refined simultaneously.
Our results are very promising, and demonstrate a significant improvement in performance over multiple benchmarks.
arXiv Detail & Related papers (2022-10-23T22:18:35Z) - Zonotope Domains for Lagrangian Neural Network Verification [102.13346781220383]
We decompose the problem of verifying a deep neural network into the verification of many 2-layer neural networks.
Our technique yields bounds that improve upon both linear programming and Lagrangian-based verification techniques.
arXiv Detail & Related papers (2022-10-14T19:31:39Z) - Adversarially Robust Learning for Security-Constrained Optimal Power
Flow [55.816266355623085]
We tackle the problem of N-k security-constrained optimal power flow (SCOPF)
N-k SCOPF is a core problem for the operation of electrical grids.
Inspired by methods in adversarially robust training, we frame N-k SCOPF as a minimax optimization problem.
arXiv Detail & Related papers (2021-11-12T22:08:10Z) - ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via
Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning.
We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z) - Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem.
We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network.
We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z) - Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data.
We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z) - Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences.
FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions.
One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.