Related papers: Compressed Computation: Dense Circuits in a Toy Model of the Universal-AND Problem

Compressed Computation: Dense Circuits in a Toy Model of the Universal-AND Problem

URL: http://arxiv.org/abs/2507.09816v1
Date: Sun, 13 Jul 2025 22:18:15 GMT
Title: Compressed Computation: Dense Circuits in a Toy Model of the Universal-AND Problem
Authors: Adam Newgas,
Abstract summary: Neural networks are capable of superposition -- representing more features than there are dimensions.<n>Recent work considers the analogous concept for computation instead of storage, proposing theoretical constructions.<n>We investigate a toy model for the Universal-AND problem which computes the AND of all $mchoose 2$ pairs of $m$ sparse inputs.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural networks are capable of superposition -- representing more features than there are dimensions. Recent work considers the analogous concept for computation instead of storage, proposing theoretical constructions. But there has been little investigation into whether these circuits can be learned in practice. In this work, we investigate a toy model for the Universal-AND problem which computes the AND of all $m\choose 2$ pairs of $m$ sparse inputs. The hidden dimension that determines the number of non-linear activations is restricted to pressure the model to find a compute-efficient circuit, called compressed computation. We find that the training process finds a simple solution that does not correspond to theoretical constructions. It is fully dense -- every neuron contributes to every output. The solution circuit naturally scales with dimension, trading off error rates for neuron efficiency. It is similarly robust to changes in sparsity and other key parameters, and extends naturally to other boolean operations and boolean circuits. We explain the found solution in detail and compute why it is more efficient than the theoretical constructions at low sparsity. Our findings shed light on the types of circuits that models like to form and the flexibility of the superposition representation. This contributes to a broader understanding of network circuitry and interpretability.

Related papers

Learning to Add, Multiply, and Execute Algorithmic Instructions Exactly with Neural Networks [5.3800094588915375]
We study the training dynamics of two-layer fully connected networks in the infinite-width limit.<n>We show how a sufficiently large ensemble of such models can be trained to execute exactly, with high probability.<n>We show how this can be efficiently achieved using only logarithmically many training data.
arXiv Detail & Related papers (2025-02-24T00:50:02Z)
Finding Transformer Circuits with Edge Pruning [71.12127707678961]
We propose Edge Pruning as an effective and scalable solution to automated circuit discovery.<n>Our method finds circuits in GPT-2 that use less than half the number of edges compared to circuits found by previous methods.<n>Thanks to its efficiency, we scale Edge Pruning to CodeLlama-13B, a model over 100x the scale that prior methods operate on.
arXiv Detail & Related papers (2024-06-24T16:40:54Z)
Robustness Verifcation in Neural Networks [0.0]
We investigate formal verification problems for Neural Network computations. One question is whether there do exist valid inputs such that the network computes a valid output. We show that the problems are conquerable in a semi-linear setting.
arXiv Detail & Related papers (2024-03-20T09:34:38Z)
A Circuit Complexity Formulation of Algorithmic Information Theory [1.5483078145498086]
Inspired by Solomonoffs theory of inductive inference, we propose a prior based on circuit complexity. We argue that an inductive bias towards simple explanations as measured by circuit complexity is appropriate for this problem.
arXiv Detail & Related papers (2023-06-25T01:30:37Z)
Transformers Learn Shortcuts to Automata [52.015990420075944]
We find that a low-depth Transformer can represent the computations of any finite-state automaton. We show that a Transformer with $O(log T)$ layers can exactly replicate the computation of an automaton on an input sequence of length $T$. We further investigate the brittleness of these solutions and propose potential mitigations.
arXiv Detail & Related papers (2022-10-19T17:45:48Z)
A single $T$-gate makes distribution learning hard [56.045224655472865]
This work provides an extensive characterization of the learnability of the output distributions of local quantum circuits. We show that for a wide variety of the most practically relevant learning algorithms -- including hybrid-quantum classical algorithms -- even the generative modelling problem associated with depth $d=omega(log(n))$ Clifford circuits is hard.
arXiv Detail & Related papers (2022-07-07T08:04:15Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers [50.85524803885483]
This work proposes a formal definition of statistically meaningful (SM) approximation which requires the approximating network to exhibit good statistical learnability. We study SM approximation for two function classes: circuits and Turing machines.
arXiv Detail & Related papers (2021-07-28T04:28:55Z)
PAC-learning gains of Turing machines over circuits and neural networks [1.4502611532302039]
We study the potential gains in sample efficiency that can bring in the principle of minimum description length. We use Turing machines to represent universal models and circuits. We highlight close relationships between classical open problems in Circuit Complexity and the tightness of these.
arXiv Detail & Related papers (2021-03-23T17:03:10Z)
Hardness of Random Optimization Problems for Boolean Circuits, Low-Degree Polynomials, and Langevin Dynamics [78.46689176407936]
We show that families of algorithms fail to produce nearly optimal solutions with high probability. For the case of Boolean circuits, our results improve the state-of-the-art bounds known in circuit complexity theory.
arXiv Detail & Related papers (2020-04-25T05:45:59Z)
Tunable Quantum Neural Networks for Boolean Functions [0.0]
We introduce the idea of a generic quantum circuit whose gates can be tuned to learn any Boolean functions. In order to perform the learning task, we have devised an algorithm that leverages the absence of measurements.
arXiv Detail & Related papers (2020-03-31T11:55:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.