Related papers: Energy Regularized RNNs for Solving Non-Stationary Bandit Problems

Energy Regularized RNNs for Solving Non-Stationary Bandit Problems

URL: http://arxiv.org/abs/2303.06552v2
Date: Tue, 28 Mar 2023 15:20:41 GMT
Title: Energy Regularized RNNs for Solving Non-Stationary Bandit Problems
Authors: Michael Rotman, Lior Wolf
Abstract summary: We present an energy term that prevents the neural network from becoming too confident in support of a certain action. We demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits.
Score: 97.72614340294547
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a Multi-Armed Bandit problem in which the rewards are non-stationary and are dependent on past actions and potentially on past contexts. At the heart of our method, we employ a recurrent neural network, which models these sequences. In order to balance between exploration and exploitation, we present an energy minimization term that prevents the neural network from becoming too confident in support of a certain action. This term provably limits the gap between the maximal and minimal probabilities assigned by the network. In a diverse set of experiments, we demonstrate that our method is at least as effective as methods suggested to solve the sub-problem of Rotting Bandits, and can solve intuitive extensions of various benchmark problems. We share our implementation at https://github.com/rotmanmi/Energy-Regularized-RNN.

Related papers

Scalable Policy Maximization Under Network Interference [46.16641537379657]
We study optimal-policy learning under interference on a dynamic network.<n>Under common assumptions on the structure of interference, rewards become linear.<n>We develop a scalable Thompson sampling algorithm that maximizes policy impact when a new $n$-node network is observed each round.
arXiv Detail & Related papers (2025-05-23T17:19:12Z)
Robust Stochastically-Descending Unrolled Networks [85.6993263983062]
Deep unrolling is an emerging learning-to-optimize method that unrolls a truncated iterative algorithm in the layers of a trainable neural network. We show that convergence guarantees and generalizability of the unrolled networks are still open theoretical problems. We numerically assess unrolled architectures trained under the proposed constraints in two different applications.
arXiv Detail & Related papers (2023-12-25T18:51:23Z)
Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems [2.6695224599322214]
We provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
arXiv Detail & Related papers (2023-09-21T14:48:02Z)
Tighter Abstract Queries in Neural Network Verification [0.0]
We present CEGARETTE, a novel verification mechanism where both the system and the property are abstracted and refined simultaneously. Our results are very promising, and demonstrate a significant improvement in performance over multiple benchmarks.
arXiv Detail & Related papers (2022-10-23T22:18:35Z)
Zonotope Domains for Lagrangian Neural Network Verification [102.13346781220383]
We decompose the problem of verifying a deep neural network into the verification of many 2-layer neural networks. Our technique yields bounds that improve upon both linear programming and Lagrangian-based verification techniques.
arXiv Detail & Related papers (2022-10-14T19:31:39Z)
Adversarially Robust Learning for Security-Constrained Optimal Power Flow [55.816266355623085]
We tackle the problem of N-k security-constrained optimal power flow (SCOPF) N-k SCOPF is a core problem for the operation of electrical grids. Inspired by methods in adversarially robust training, we frame N-k SCOPF as a minimax optimization problem.
arXiv Detail & Related papers (2021-11-12T22:08:10Z)
ROMAX: Certifiably Robust Deep Multiagent Reinforcement Learning via Convex Relaxation [32.091346776897744]
Cyber-physical attacks can challenge the robustness of multiagent reinforcement learning. We propose a minimax MARL approach to infer the worst-case policy update of other agents.
arXiv Detail & Related papers (2021-09-14T16:18:35Z)
Decentralized Multi-Agent Linear Bandits with Safety Constraints [31.67685495996986]
We study decentralized linear bandits, where a network of $N$ agents acts cooperatively to solve a linear bandit-optimization problem. We propose DLUCB: a fully decentralized algorithm that minimizes the cumulative regret over the entire network. We show that our ideas extend naturally to the emerging, albeit more challenging, setting of safe bandits.
arXiv Detail & Related papers (2020-12-01T07:33:00Z)
Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data. We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z)
Targeted free energy estimation via learned mappings [66.20146549150475]
Free energy perturbation (FEP) was proposed by Zwanzig more than six decades ago as a method to estimate free energy differences. FEP suffers from a severe limitation: the requirement of sufficient overlap between distributions. One strategy to mitigate this problem, called Targeted Free Energy Perturbation, uses a high-dimensional mapping in configuration space to increase overlap.
arXiv Detail & Related papers (2020-02-12T11:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.