Related papers: Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks

Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks

URL: http://arxiv.org/abs/2506.07500v1
Date: Mon, 09 Jun 2025 07:25:51 GMT
Title: Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks
Authors: Shakir Yousefi, Andreas Plesner, Till Aczel, Roger Wattenhofer,
Abstract summary: We train networks $4.5 times$ faster in wall-clock time, reduce the discretization gap by $98%$, and reduce the number of unused gates by $100%$.<n>We show that this results from implicit Hessian regularization, which improves the convergence properties of LGNs.<n>We train networks $4.5 times$ faster in wall-clock time, reduce the discretization gap by $98%$, and reduce the number of unused gates by $100%$.
Score: 18.95453617434051
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern neural networks demonstrate state-of-the-art performance on numerous existing benchmarks; however, their high computational requirements and energy consumption prompt researchers to seek more efficient solutions for real-world deployment. Logic gate networks (LGNs) learns a large network of logic gates for efficient image classification. However, learning a network that can solve a simple problem like CIFAR-10 can take days to weeks to train. Even then, almost half of the network remains unused, causing a discretization gap. This discretization gap hinders real-world deployment of LGNs, as the performance drop between training and inference negatively impacts accuracy. We inject Gumbel noise with a straight-through estimator during training to significantly speed up training, improve neuron utilization, and decrease the discretization gap. We theoretically show that this results from implicit Hessian regularization, which improves the convergence properties of LGNs. We train networks $4.5 \times$ faster in wall-clock time, reduce the discretization gap by $98\%$, and reduce the number of unused gates by $100\%$.

Related papers

Convolutional Differentiable Logic Gate Networks [68.74313756770123]
We propose an approach for learning logic gate networks directly via a differentiable relaxation. We build on this idea, extending it by deep logic gate tree convolutions and logical OR pooling. On CIFAR-10, we achieve an accuracy of 86.29% using only 61 million logic gates, which improves over the SOTA while being 29x smaller.
arXiv Detail & Related papers (2024-11-07T14:12:00Z)
Data-Free Dynamic Compression of CNNs for Tractable Efficiency [46.498278084317704]
structured pruning approaches have shown promise in lowering floating-point operations without substantial drops in accuracy.<n>We propose HASTE (Hashing for Tractable Efficiency), a data-free, plug-and-play convolution module that instantly reduces a network's test-time inference cost without training or fine-tuning.<n>We demonstrate our approach on the popular vision benchmarks CIFAR-10 and ImageNet, where we achieve a 46.72% reduction in FLOPs with only a 1.25% loss in accuracy.
arXiv Detail & Related papers (2023-09-29T13:09:40Z)
Communication-Free Distributed GNN Training with Vertex Cut [63.22674903170953]
CoFree-GNN is a novel distributed GNN training framework that significantly speeds up the training process by implementing communication-free training. We demonstrate that CoFree-GNN speeds up the GNN training process by up to 10 times over the existing state-of-the-art GNN training approaches.
arXiv Detail & Related papers (2023-08-06T21:04:58Z)
Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems. We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z)
Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency. We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training. We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z)
Learn Locally, Correct Globally: A Distributed Algorithm for Training Graph Neural Networks [22.728439336309858]
We propose a communication-efficient distributed GNN training technique named $textLearn Locally, Correct Globally$ (LLCG) LLCG trains a GNN on its local data by ignoring the dependency between nodes among different machines, then sends the locally trained model to the server for periodic model averaging. We rigorously analyze the convergence of distributed methods with periodic model averaging for training GNNs and show that naively applying periodic model averaging but ignoring the dependency between nodes will suffer from an irreducible residual error.
arXiv Detail & Related papers (2021-11-16T03:07:01Z)
GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization [84.57695474130273]
Gate-based or importance-based pruning methods aim to remove channels whose importance is smallest. GDP can be plugged before convolutional layers without bells and whistles, to control the on-and-off of each channel. Experiments conducted over CIFAR-10 and ImageNet datasets show that the proposed GDP achieves the state-of-the-art performance.
arXiv Detail & Related papers (2021-09-06T03:17:10Z)
Selfish Sparse RNN Training [13.165729746380816]
We propose an approach to train sparse RNNs with a fixed parameter count in one single run, without compromising performance. We achieve state-of-the-art sparse training results with various datasets on Penn TreeBank and Wikitext-2.
arXiv Detail & Related papers (2021-01-22T10:45:40Z)
ZORB: A Derivative-Free Backpropagation Algorithm for Neural Networks [3.6562366216810447]
We present a simple yet faster training algorithm called Zeroth-Order Relaxed Backpropagation (ZORB) Instead of calculating gradients, ZORB uses the pseudoinverse of targets to backpropagate information. Experiments on standard classification and regression benchmarks demonstrate ZORB's advantage over traditional backpropagation with Gradient Descent.
arXiv Detail & Related papers (2020-11-17T19:29:47Z)
Pruning Convolutional Filters using Batch Bridgeout [14.677724755838556]
State-of-the-art computer vision models are rapidly increasing in capacity, where the number of parameters far exceeds the number required to fit the training set. This results in better optimization and generalization performance. In order to reduce inference costs, convolutional filters in trained neural networks could be pruned to reduce the run-time memory and computational requirements during inference. We propose the use of Batch Bridgeout, a sparsity inducing regularization scheme, to train neural networks so that they could be pruned efficiently with minimal degradation in performance.
arXiv Detail & Related papers (2020-09-23T01:51:47Z)
Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization. Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training. We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.