Related papers: Low coordinate degree algorithms II: Categorical signals and generalized stochastic block models

Low coordinate degree algorithms II: Categorical signals and generalized stochastic block models

URL: http://arxiv.org/abs/2412.21155v1
Date: Mon, 30 Dec 2024 18:34:36 GMT
Title: Low coordinate degree algorithms II: Categorical signals and generalized stochastic block models
Authors: Dmitriy Kunisky,
Abstract summary: We study when low coordinate degree functions can test for the presence of categorical structure in high-dimensional data.<n>This complements the first paper of this series, which studied the power of LCDF in testing for continuous structure.
Score: 2.4889993472438383
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study when low coordinate degree functions (LCDF) -- linear combinations of functions depending on small subsets of entries of a vector -- can test for the presence of categorical structure, including community structure and generalizations thereof, in high-dimensional data. This complements the first paper of this series, which studied the power of LCDF in testing for continuous structure like real-valued signals perturbed by additive noise. We apply the tools developed there to a general form of stochastic block model (SBM), where a population is assigned random labels and every $p$-tuple of the population generates an observation according to an arbitrary probability measure associated to the $p$ labels of its members. We show that the performance of LCDF admits a unified analysis for this class of models. As applications, we prove tight lower bounds against LCDF (and therefore also against low degree polynomials) for nearly arbitrary graph and regular hypergraph SBMs, always matching suitable generalizations of the Kesten-Stigum threshold. We also prove tight lower bounds for group synchronization and abelian group sumset problems under the "truth-or-Haar" noise model, and use our technical results to give an improved analysis of Gaussian multi-frequency group synchronization. In most of these models, for some parameter settings our lower bounds give new evidence for conjectural statistical-to-computational gaps. Finally, interpreting some of our findings, we propose a precise analogy between categorical and continuous signals: a general SBM as above behaves, in terms of the tradeoff between subexponential runtime cost of testing algorithms and the signal strength needed for a testing algorithm to succeed, like a spiked $p_*$-tensor model of a certain order $p_*$ that may be computed from the parameters of the SBM.

Related papers

Federated Nonparametric Hypothesis Testing with Differential Privacy Constraints: Optimal Rates and Adaptive Tests [5.3595271893779906]
Federated learning has attracted significant recent attention due to its applicability across a wide range of settings where data is collected and analyzed across disparate locations. We study federated nonparametric goodness-of-fit testing in the white-noise-with-drift model under distributed differential privacy (DP) constraints.
arXiv Detail & Related papers (2024-06-10T19:25:19Z)
Low coordinate degree algorithms I: Universality of computational thresholds for hypothesis testing [2.4889993472438383]
Low coordinate degree functions (LCDF) can hypothesis test between high-dimensional probability measures. We show that LCDF can be tested for the presence of sufficiently "dilute" random signals through noisy channels. These results are the first computational lower bounds against any large class of algorithms for all of these models.
arXiv Detail & Related papers (2024-03-12T17:52:35Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Graph Signal Sampling for Inductive One-Bit Matrix Completion: a Closed-form Solution [112.3443939502313]
We propose a unified graph signal sampling framework which enjoys the benefits of graph signal analysis and processing. The key idea is to transform each user's ratings on the items to a function (signal) on the vertices of an item-item graph. For the online setting, we develop a Bayesian extension, i.e., BGS-IMC which considers continuous random Gaussian noise in the graph Fourier domain.
arXiv Detail & Related papers (2023-02-08T08:17:43Z)
Estimating Higher-Order Mixed Memberships via the $\ell_{2,\infty}$ Tensor Perturbation Bound [8.521132000449766]
We propose the tensor mixed-membership blockmodel, a generalization of the tensor blockmodel. We establish the identifiability of our model and propose a computationally efficient estimation procedure. We apply our methodology to real and simulated data, demonstrating some effects not identifiable from the model with discrete community memberships.
arXiv Detail & Related papers (2022-12-16T18:32:20Z)
Instability and Local Minima in GAN Training with Kernel Discriminators [20.362912591032636]
Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is kernel-based.
arXiv Detail & Related papers (2022-08-21T18:03:06Z)
Clipped Stochastic Methods for Variational Inequalities with Heavy-Tailed Noise [64.85879194013407]
We prove the first high-probability results with logarithmic dependence on the confidence level for methods for solving monotone and structured non-monotone VIPs. Our results match the best-known ones in the light-tails case and are novel for structured non-monotone problems. In addition, we numerically validate that the gradient noise of many practical formulations is heavy-tailed and show that clipping improves the performance of SEG/SGDA.
arXiv Detail & Related papers (2022-06-02T15:21:55Z)
Subspace clustering in high-dimensions: Phase transitions \& Statistical-to-Computational gap [24.073221004661427]
A simple model to study subspace clustering is the high-dimensional $k$-Gaussian mixture model. We provide an exact characterization of the statistically optimal reconstruction error in this model in the high-dimensional regime with extensive sparsity.
arXiv Detail & Related papers (2022-05-26T17:47:35Z)
Partial recovery and weak consistency in the non-uniform hypergraph Stochastic Block Model [6.681901523019242]
We consider the community detection problem in random hypergraphs under the nonuniform hypergraph block model (HSBM) We provide a spectral algorithm that outputs a partition with at least a $gamma$ fraction of the vertices classified correctly, where $gammain depends on the signal-to-noise ratio (SNR) of the model. The theoretical analysis of our algorithm relies on the concentration and regularization of the adjacency matrix for non-uniform random hypergraphs, which can be of independent interest.
arXiv Detail & Related papers (2021-12-22T05:38:33Z)
Lattice partition recovery with dyadic CART [79.96359947166592]
We study piece-wise constant signals corrupted by additive Gaussian noise over a $d$-dimensional lattice. Data of this form naturally arise in a host of applications, and the tasks of signal detection or testing, de-noising and estimation have been studied extensively in the statistical and signal processing literature. In this paper we consider instead the problem of partition recovery, i.e.of estimating the partition of the lattice induced by the constancy regions of the unknown signal. We prove that a DCART-based procedure consistently estimates the underlying partition at a rate of order $sigma2 k*
arXiv Detail & Related papers (2021-05-27T23:41:01Z)
Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores) For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training. We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z)
Probabilistic Circuits for Variational Inference in Discrete Graphical Models [101.28528515775842]
Inference in discrete graphical models with variational methods is difficult. Many sampling-based methods have been proposed for estimating Evidence Lower Bound (ELBO) We propose a new approach that leverages the tractability of probabilistic circuit models, such as Sum Product Networks (SPN) We show that selective-SPNs are suitable as an expressive variational distribution, and prove that when the log-density of the target model is aweighted the corresponding ELBO can be computed analytically.
arXiv Detail & Related papers (2020-10-22T05:04:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.