Sample-efficient L0-L2 constrained structure learning of sparse Ising
models
- URL: http://arxiv.org/abs/2012.01744v2
- Date: Fri, 4 Dec 2020 23:08:24 GMT
- Title: Sample-efficient L0-L2 constrained structure learning of sparse Ising
models
- Authors: Antoine Dedieu, Miguel L\'azaro-Gredilla, Dileep George
- Abstract summary: We consider the problem of learning the underlying graph of a sparse Ising model with $p$ nodes from $n$ i.i.d. samples.
We leverage the cardinality constraint L0 norm, which is known to properly induce sparsity, and further combine it with an L2 norm to better model the non-zero coefficients.
- Score: 3.056751497358646
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning the underlying graph of a sparse Ising
model with $p$ nodes from $n$ i.i.d. samples. The most recent and best
performing approaches combine an empirical loss (the logistic regression loss
or the interaction screening loss) with a regularizer (an L1 penalty or an L1
constraint). This results in a convex problem that can be solved separately for
each node of the graph. In this work, we leverage the cardinality constraint L0
norm, which is known to properly induce sparsity, and further combine it with
an L2 norm to better model the non-zero coefficients. We show that our proposed
estimators achieve an improved sample complexity, both (a) theoretically -- by
reaching new state-of-the-art upper bounds for recovery guarantees -- and (b)
empirically -- by showing sharper phase transitions between poor and full
recovery for graph topologies studied in the literature -- when compared to
their L1-based counterparts.
Related papers
- Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - Error bounds for particle gradient descent, and extensions of the log-Sobolev and Talagrand inequalities [1.3124513975412255]
We show that for models satisfying a condition generalizing both the log-Sobolev and the Polyak--Lojasiewicz inequalities, the flow converges exponentially fast to the set of minimizers of the free energy.
We also generalize the Bakry--'Emery Theorem and show that the LSI/PLI generalization holds for models with strongly concave log-likelihoods.
arXiv Detail & Related papers (2024-03-04T12:57:26Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Fine-grained analysis of non-parametric estimation for pairwise learning [9.676007573960383]
We are concerned with the generalization performance of non-parametric estimation for pairwise learning.
Our results can be used to handle a wide range of pairwise learning problems including ranking, AUC, pairwise regression and metric and similarity learning.
arXiv Detail & Related papers (2023-05-31T08:13:14Z) - Principled Reinforcement Learning with Human Feedback from Pairwise or
$K$-wise Comparisons [79.98542868281473]
We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF)
We show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions.
arXiv Detail & Related papers (2023-01-26T18:07:21Z) - A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized
Linear Models [33.36787620121057]
We prove a new generalization bound that shows for any class of linear predictors in Gaussian space.
We use our finite-sample bound to directly recover the "optimistic rate" of Zhou et al. (2021)
We show that application of our bound generalization using localized Gaussian width will generally be sharp for empirical risk minimizers.
arXiv Detail & Related papers (2022-10-21T16:16:55Z) - KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal [70.15267479220691]
We consider and analyze the sample complexity of model reinforcement learning with a generative variance-free model.
Our analysis shows that it is nearly minimax-optimal for finding an $varepsilon$-optimal policy when $varepsilon$ is sufficiently small.
arXiv Detail & Related papers (2022-05-27T19:39:24Z) - Sparsest Univariate Learning Models Under Lipschitz Constraint [31.28451181040038]
We propose continuous-domain formulations for one-dimensional regression problems.
We control the Lipschitz constant explicitly using a user-defined upper-bound.
We show that both problems admit global minimizers that are continuous and piecewise-linear.
arXiv Detail & Related papers (2021-12-27T07:03:43Z) - The Performance of the MLE in the Bradley-Terry-Luce Model in
$\ell_{\infty}$-Loss and under General Graph Topologies [76.61051540383494]
We derive novel, general upper bounds on the $ell_infty$ estimation error of the Bradley-Terry-Luce model.
We demonstrate that the derived bounds perform well and in some cases are sharper compared to known results.
arXiv Detail & Related papers (2021-10-20T23:46:35Z) - Localization, Convexity, and Star Aggregation [0.0]
Offset Rademacher complexities have been shown to imply sharp, linear-dependent upper bounds for the square loss.
We show that in the statistical setting, the offset bound can be generalized to any loss satisfying certain uniform convexity.
arXiv Detail & Related papers (2021-05-19T00:47:59Z) - Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal
Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error.
We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.