Light Differentiable Logic Gate Networks
- URL: http://arxiv.org/abs/2510.03250v1
- Date: Fri, 26 Sep 2025 04:44:51 GMT
- Title: Light Differentiable Logic Gate Networks
- Authors: Lukas Rüttgers, Till Aczel, Andreas Plesner, Roger Wattenhofer,
- Abstract summary: Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy.<n>But vanishing gradients, discretization errors, and high training cost impede scaling these networks.<n>We show that the root cause of these issues lies in the underlying parametrization of logic gate neurons themselves.
- Score: 28.844098517315228
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Differentiable logic gate networks (DLGNs) exhibit extraordinary efficiency at inference while sustaining competitive accuracy. But vanishing gradients, discretization errors, and high training cost impede scaling these networks. Even with dedicated parameter initialization schemes from subsequent works, increasing depth still harms accuracy. We show that the root cause of these issues lies in the underlying parametrization of logic gate neurons themselves. To overcome this issue, we propose a reparametrization that also shrinks the parameter size logarithmically in the number of inputs per gate. For binary inputs, this already reduces the model size by 4x, speeds up the backward pass by up to 1.86x, and converges in 8.5x fewer training steps. On top of that, we show that the accuracy on CIFAR-100 remains stable and sometimes superior to the original parametrization.
Related papers
- Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics [64.62231094774211]
Statefuls (e.g., Adam) maintain auxiliary information even 2x the model size in order to achieve optimal convergence.<n>SOLO enables Adam-styles to maintain quantized states with precision as low as 3 bits, or even 2 bits.<n>SOLO can thus be seamlessly applied to Adam-styles, leading to substantial memory savings with minimal accuracy loss.
arXiv Detail & Related papers (2025-05-01T06:47:45Z) - Convolutional Differentiable Logic Gate Networks [68.74313756770123]
We propose an approach for learning logic gate networks directly via a differentiable relaxation.
We build on this idea, extending it by deep logic gate tree convolutions and logical OR pooling.
On CIFAR-10, we achieve an accuracy of 86.29% using only 61 million logic gates, which improves over the SOTA while being 29x smaller.
arXiv Detail & Related papers (2024-11-07T14:12:00Z) - Guaranteed Approximation Bounds for Mixed-Precision Neural Operators [83.64404557466528]
We build on intuition that neural operator learning inherently induces an approximation error.
We show that our approach reduces GPU memory usage by up to 50% and improves throughput by 58% with little or no reduction in accuracy.
arXiv Detail & Related papers (2023-07-27T17:42:06Z) - Error mitigation, optimization, and extrapolation on a trapped ion testbed [0.05185707610786576]
A form of error mitigation called zero noise extrapolation (ZNE) can decrease an algorithm's sensitivity to these errors without increasing the number of required qubits.
We explore different methods for integrating this error mitigation technique into the Variational Quantum Eigensolver (VQE) algorithm.
Our results show that the efficacy of this error mitigation technique depends on choosing the correct implementation for a given device architecture.
arXiv Detail & Related papers (2023-07-13T19:02:39Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - Hardware optimized parity check gates for superconducting surface codes [0.0]
Error correcting codes use multi-qubit measurements to realize fault-tolerant quantum logic steps.
We analyze an unconventional surface code based on multi-body interactions between superconducting transmon qubits.
Despite the multi-body effects that underpin this approach, our estimates of logical faults suggest that this design can be at least as robust to realistic noise as conventional designs.
arXiv Detail & Related papers (2022-11-11T18:00:30Z) - Optimizing Rydberg Gates for Logical Qubit Performance [0.0]
We present a family of Rydberg blockade gates for neutral atom qubits that are robust against two common, major imperfections.
These gates outperform existing gates for moderate or large imperfections.
Results significantly reduce the laser stability and atomic temperature requirements to achieve fault-tolerant quantum computing with neutral atoms.
arXiv Detail & Related papers (2022-10-13T10:04:08Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.