Related papers: Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits

Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits

URL: http://arxiv.org/abs/2602.03970v2
Date: Mon, 09 Feb 2026 22:59:16 GMT
Title: Statistical Guarantees for Reasoning Probes on Looped Boolean Circuits
Authors: Anastasis Kratsios, Giulia Livieri, A. Martina Neuman,
Abstract summary: We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning.<n>A reasoning probe has access to a sampled subset of internal nodes, possibly without covering the entire graph.<n>We show that, when the reasoning probe is parameterized by a graph convolutional network (GCN)-based hypothesis class and queries $N$ nodes, the worst-case generalization error attains the optimal rate.
Score: 10.292476979020522
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the statistical behaviour of reasoning probes in a stylized model of looped reasoning, given by Boolean circuits whose computational graph is a perfect $ν$-ary tree ($ν\ge 2$) and whose output is appended to the input and fed back iteratively for subsequent computation rounds. A reasoning probe has access to a sampled subset of internal computation nodes, possibly without covering the entire graph, and seeks to infer which $ν$-ary Boolean gate is executed at each queried node, representing uncertainty via a probability distribution over a fixed collection of $\mathtt{m}$ admissible $ν$-ary gates. This partial observability induces a generalization problem, which we analyze in a realizable, transductive setting. We show that, when the reasoning probe is parameterized by a graph convolutional network (GCN)-based hypothesis class and queries $N$ nodes, the worst-case generalization error attains the optimal rate $\mathcal{O}(\sqrt{\log(2/δ)}/\sqrt{N})$ with probability at least $1-δ$, for $δ\in (0,1)$. Our analysis combines snowflake metric embedding techniques with tools from statistical optimal transport. A key insight is that this optimal rate is achievable independently of graph size, owing to the existence of a low-distortion one-dimensional snowflake embedding of the induced graph metric. As a consequence, our results provide a sharp characterization of how structural properties of the computational graph govern the statistical efficiency of reasoning under partial access.

Related papers

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
Matching the Statistical Query Lower Bound for $k$-Sparse Parity Problems with Sign Stochastic Gradient Descent [83.85536329832722]
We solve the $k$-sparse parity problem with sign gradient descent (SGD) on two-layer fully-connected neural networks.<n>We show that this approach can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hypercube.<n>We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors.
arXiv Detail & Related papers (2024-04-18T17:57:53Z)
Computational-Statistical Gaps in Gaussian Single-Index Models [77.1473134227844]
Single-Index Models are high-dimensional regression problems with planted structure. We show that computationally efficient algorithms, both within the Statistical Query (SQ) and the Low-Degree Polynomial (LDP) framework, necessarily require $Omega(dkstar/2)$ samples.
arXiv Detail & Related papers (2024-03-08T18:50:19Z)
Planted Bipartite Graph Detection [13.95780443241133]
We consider the task of detecting a hidden bipartite subgraph in a given random graph. Under the null hypothesis, the graph is a realization of an ErdHosR'enyi random graph over $n$ with edge density $q$. Under the alternative, there exists a planted $k_mathsfR times k_mathsfL$ bipartite subgraph with edge density $p>q$.
arXiv Detail & Related papers (2023-02-07T18:18:17Z)
Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model. It is assumed that the graph's structure is known and has $N$ nodes. Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z)
Bi-stochastically normalized graph Laplacian: convergence to manifold Laplacian and robustness to outlier noise [10.418647759223965]
Bi-stochastic normalization provides an alternative normalization of graph Laplacians in graph-based data analysis. We prove the convergence of bi-stochastically normalized graph Laplacian to manifold (weighted-)Laplacian with rates. When the manifold data are corrupted by outlier noise, we theoretically prove the graph Laplacian point-wise consistency.
arXiv Detail & Related papers (2022-06-22T21:08:24Z)
Decentralized Sparse Linear Regression via Gradient-Tracking: Linear Convergence and Statistical Guarantees [23.256961881716595]
We study a sparse linear regression over a network of agents, modeled as an undirected graph and no server node.<n>We analyze the convergence rate and statistical guarantees of a distributed projected gradient tracking-based algorithm.
arXiv Detail & Related papers (2022-01-21T01:26:08Z)
Distributed Sparse Regression via Penalization [5.990069843501885]
We study linear regression over a network of agents, modeled as an undirected graph (with no centralized node) The estimation problem is formulated as the minimization of the sum of the local LASSO loss functions plus a quadratic penalty of the consensus constraint. We show that the proximal-gradient algorithm applied to the penalized problem converges linearly up to a tolerance of the order of the centralized statistical error.
arXiv Detail & Related papers (2021-11-12T01:51:50Z)
Inferring Hidden Structures in Random Graphs [13.031167737538881]
We study the two inference problems of detecting and recovering an isolated community of emphgeneral structure planted in a random graph. We derive lower bounds for detecting/recovering the structure $Gamma_k$ in terms of the parameters $(n,k,q)$, as well as certain properties of $Gamma_k$, and exhibit computationally optimal algorithms that achieve these lower bounds.
arXiv Detail & Related papers (2021-10-05T09:39:51Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
Testing correlation of unlabeled random graphs [18.08210501570919]
We study the problem of detecting the edge correlation between two random graphs with $n$ unlabeled nodes. This is formalized as a hypothesis testing problem, where under the null hypothesis, the two graphs are independently generated. Under the alternative, the two graphs are edge-correlated under some latent node correspondence, but have the same marginal distributions as the null.
arXiv Detail & Related papers (2020-08-23T19:19:45Z)
Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs. We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions. Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.