Deep reinforced learning heuristic tested on spin-glass ground states:
The larger picture
- URL: http://arxiv.org/abs/2302.10848v2
- Date: Thu, 14 Sep 2023 13:29:40 GMT
- Title: Deep reinforced learning heuristic tested on spin-glass ground states:
The larger picture
- Authors: Stefan Boettcher (Emory U)
- Abstract summary: Authors present a deep reinforced learning approach to augment optimizations.
In particular, they present results for several spin glass ground state problems.
Here, we put these studies into a larger context, showing that the claimed superiority is marginal for smaller samples.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Changjun Fan et al. [Nature Communications
https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep
reinforced learning approach to augment combinatorial optimization heuristics.
In particular, they present results for several spin glass ground state
problems, for which instances on non-planar networks are generally NP-hard, in
comparison with several Monte Carlo based methods, such as simulated annealing
(SA) or parallel tempering (PT). Indeed, those results demonstrate that the
reinforced learning improves the results over those obtained with SA or PT, or
at least allows for reduced runtimes for the heuristics before results of
comparable quality have been obtained relative to those other methods. To
facilitate the conclusion that their method is ''superior'', the authors pursue
two basic strategies: (1) A commercial GUROBI solver is called on to procure a
sample of exact ground states as a testbed to compare with, and (2) a
head-to-head comparison between the heuristics is given for a sample of larger
instances where exact ground states are hard to ascertain. Here, we put these
studies into a larger context, showing that the claimed superiority is at best
marginal for smaller samples and becomes essentially irrelevant with respect to
any sensible approximation of true ground states in the larger samples. For
example, this method becomes irrelevant as a means to determine stiffness
exponents $\theta$ in $d>2$, as mentioned by the authors, where the problem is
not only NP-hard but requires the subtraction of two almost equal ground-state
energies and systemic errors in each of $\approx 1\%$ found here are
unacceptable. This larger picture on the method arises from a straightforward
finite-size corrections study over the spin glass ensembles the authors employ,
using data that has been available for decades.
Related papers
- Improving Reinforcement Learning Sample-Efficiency using Local Approximation [2.5582913676558205]
We derive PAC bounds on the sample-complexity for RL within the infinite-horizon Markov Decision Process setting.<n>We are able to extend these results to an infinite-horizon, model-free setting by constructing a PAC-MDP algorithm with the aforementioned sample-complexity.
arXiv Detail & Related papers (2025-07-16T16:31:17Z) - Identifying Heterogeneity in Distributed Learning [1.7244120238071492]
We study methods for identifying heterogeneous parameter components in distributed M-estimation with minimal data transmission.<n>One is based on a re-normalized Wald test, which is shown to be consistent as long as the number of distributed data blocks $K$ is of a smaller order of the minimum block sample size.<n>The second one is an extreme contrast test (ECT) based on the difference between the largest and smallest component-wise estimated parameters among data blocks.
arXiv Detail & Related papers (2025-06-19T15:26:48Z) - Rapid initial state preparation for the quantum simulation of strongly correlated molecules [4.639143844012453]
We show how to achieve unitary synthesis with a Toffoli complexity about $7 times$ lower than that in prior work.
For filtering we present two different approaches: sampling and binary search.
arXiv Detail & Related papers (2024-09-18T07:04:32Z) - Best Arm Identification with Fixed Budget: A Large Deviation Perspective [54.305323903582845]
We present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
In particular, we present sred, a truly adaptive algorithm that can reject arms in it any round based on the observed empirical gaps between the rewards of various arms.
arXiv Detail & Related papers (2023-12-19T13:17:43Z) - Efficiently Escaping Saddle Points for Non-Convex Policy Optimization [40.0986936439803]
Policy gradient (PG) is widely used in reinforcement learning due to its scalability and good performance.
We propose a variance-reduced second-order method that uses second-order information in the form of Hessian vector products (HVP) and converges to an approximate second-order stationary point (SOSP) with sample complexity of $tildeO(epsilon-3)$.
arXiv Detail & Related papers (2023-11-15T12:36:45Z) - Langevin Thompson Sampling with Logarithmic Communication: Bandits and
Reinforcement Learning [34.4255062106615]
Thompson sampling (TS) is widely used in sequential decision making due to its ease of use and appealing empirical performance.
We propose batched $textitLangevin Thompson Sampling$ algorithms that leverage MCMC methods to sample from approximate posteriors with only logarithmic communication costs in terms of batches.
Our algorithms are computationally efficient and maintain the same order-optimal regret guarantees of $mathcalO(log T)$ for MABs, and $mathcalO(sqrtT)$ for RL.
arXiv Detail & Related papers (2023-06-15T01:16:29Z) - A Practical Upper Bound for the Worst-Case Attribution Deviations [21.341303776931532]
Model attribution is a critical component of deep neural networks (DNNs) for its interpretability to complex models.
Recent studies bring up attention to the security of attribution methods as they are vulnerable to attribution attacks that generate similar images with dramatically different attributions.
Existing works have been investigating empirically improving the robustness of DNNs against those attacks; however, none of them explicitly quantifies the actual deviations of attributions.
In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region
arXiv Detail & Related papers (2023-03-01T09:07:27Z) - Scale-Equivalent Distillation for Semi-Supervised Object Detection [57.59525453301374]
Recent Semi-Supervised Object Detection (SS-OD) methods are mainly based on self-training, generating hard pseudo-labels by a teacher model on unlabeled data as supervisory signals.
We analyze the challenges these methods meet with the empirical experiment results.
We introduce a novel approach, Scale-Equivalent Distillation (SED), which is a simple yet effective end-to-end knowledge distillation framework robust to large object size variance and class imbalance.
arXiv Detail & Related papers (2022-03-23T07:33:37Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z) - Breaking the Sample Size Barrier in Model-Based Reinforcement Learning
with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator)
We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$.
We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z) - The Simulator: Understanding Adaptive Sampling in the
Moderate-Confidence Regime [52.38455827779212]
We propose a novel technique for analyzing adaptive sampling called the em Simulator.
We prove the first instance-based lower bounds the top-k problem which incorporate the appropriate log-factors.
Our new analysis inspires a simple and near-optimal for the best-arm and top-k identification, the first em practical of its kind for the latter problem.
arXiv Detail & Related papers (2017-02-16T23:42:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.