Structure Learning in Inverse Ising Problems Using $\ell_2$-Regularized
Linear Estimator
- URL: http://arxiv.org/abs/2008.08342v2
- Date: Tue, 24 Nov 2020 02:29:14 GMT
- Title: Structure Learning in Inverse Ising Problems Using $\ell_2$-Regularized
Linear Estimator
- Authors: Xiangming Meng and Tomoyuki Obuchi and Yoshiyuki Kabashima
- Abstract summary: We show that despite the model mismatch, one can perfectly identify the network structure using naive linear regression without regularization.
We propose a two-stage estimator: In the first stage, the ridge regression is used and the estimates are pruned by a relatively small threshold.
This estimator with the appropriate regularization coefficient and thresholds is shown to achieve the perfect identification of the network structure even in $0M/N1$.
- Score: 8.89493507314525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The inference performance of the pseudolikelihood method is discussed in the
framework of the inverse Ising problem when the $\ell_2$-regularized (ridge)
linear regression is adopted. This setup is introduced for theoretically
investigating the situation where the data generation model is different from
the inference one, namely the model mismatch situation. In the teacher-student
scenario under the assumption that the teacher couplings are sparse, the
analysis is conducted using the replica and cavity methods, with a special
focus on whether the presence/absence of teacher couplings is correctly
inferred or not. The result indicates that despite the model mismatch, one can
perfectly identify the network structure using naive linear regression without
regularization when the number of spins $N$ is smaller than the dataset size
$M$, in the thermodynamic limit $N\to \infty$. Further, to access the
underdetermined region $M < N$, we examine the effect of the $\ell_2$
regularization, and find that biases appear in all the coupling estimates,
preventing the perfect identification of the network structure. We, however,
find that the biases are shown to decay exponentially fast as the distance from
the center spin chosen in the pseudolikelihood method grows. Based on this
finding, we propose a two-stage estimator: In the first stage, the ridge
regression is used and the estimates are pruned by a relatively small
threshold; in the second stage the naive linear regression is conducted only on
the remaining couplings, and the resultant estimates are again pruned by
another relatively large threshold. This estimator with the appropriate
regularization coefficient and thresholds is shown to achieve the perfect
identification of the network structure even in $0<M/N<1$. Results of extensive
numerical experiments support these findings.
Related papers
- Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - Analysis of the expected $L_2$ error of an over-parametrized deep neural
network estimate learned by gradient descent without regularization [7.977229957867868]
Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical $L$ risk are universally consistent.
In this paper, we show that the regularization term is not necessary to obtain similar results.
arXiv Detail & Related papers (2023-11-24T17:04:21Z) - A U-turn on Double Descent: Rethinking Parameter Counting in Statistical
Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n.
This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Retire: Robust Expectile Regression in High Dimensions [3.9391041278203978]
Penalized quantile and expectile regression methods offer useful tools to detect heteroscedasticity in high-dimensional data.
We propose and study (penalized) robust expectile regression (retire)
We show that the proposed procedure can be efficiently solved by a semismooth Newton coordinate descent algorithm.
arXiv Detail & Related papers (2022-12-11T18:03:12Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Regularization Matters: A Nonparametric Perspective on Overparametrized
Neural Network [20.132432350255087]
Overparametrized neural networks trained by tangent descent (GD) can provably overfit any training data.
This paper studies how well overparametrized neural networks can recover the true target function in the presence of random noises.
arXiv Detail & Related papers (2020-07-06T01:02:23Z) - The Generalized Lasso with Nonlinear Observations and Generative Priors [63.541900026673055]
We make the assumption of sub-Gaussian measurements, which is satisfied by a wide range of measurement models.
We show that our result can be extended to the uniform recovery guarantee under the assumption of a so-called local embedding property.
arXiv Detail & Related papers (2020-06-22T16:43:35Z) - Path Sample-Analytic Gradient Estimators for Stochastic Binary Networks [78.76880041670904]
In neural networks with binary activations and or binary weights the training by gradient descent is complicated.
We propose a new method for this estimation problem combining sampling and analytic approximation steps.
We experimentally show higher accuracy in gradient estimation and demonstrate a more stable and better performing training in deep convolutional models.
arXiv Detail & Related papers (2020-06-04T21:51:21Z) - A Precise High-Dimensional Asymptotic Theory for Boosting and
Minimum-$\ell_1$-Norm Interpolated Classifiers [3.167685495996986]
This paper establishes a precise high-dimensional theory for boosting on separable data.
Under a class of statistical models, we provide an exact analysis of the universality error of boosting.
We also explicitly pin down the relation between the boosting test error and the optimal Bayes error.
arXiv Detail & Related papers (2020-02-05T00:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.