Related papers: Compiling to linear neurons

Compiling to linear neurons

URL: http://arxiv.org/abs/2511.13769v1
Date: Fri, 14 Nov 2025 22:19:50 GMT
Title: Compiling to linear neurons
Authors: Joey Velez-Ginorio, Nada Amin, Konrad Kording, Steve Zdancewic,
Abstract summary: We don't program neural networks directly. Instead, we rely on an indirect style where learning algorithms determine a neural network's function by learning from data.<n>We can't compile most algorithms into a neural network -- even if these algorithms could help the network learn.<n>We introduce $textsfCajal$: a typed, higher-order and linear programming language intended to be a minimal vehicle for exploring a direct style of programming neural networks.
Score: 0.5249805590164902
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We don't program neural networks directly. Instead, we rely on an indirect style where learning algorithms, like gradient descent, determine a neural network's function by learning from data. This indirect style is often a virtue; it empowers us to solve problems that were previously impossible. But it lacks discrete structure. We can't compile most algorithms into a neural network -- even if these algorithms could help the network learn. This limitation occurs because discrete algorithms are not obviously differentiable, making them incompatible with the gradient-based learning algorithms that determine a neural network's function. To address this, we introduce $\textsf{Cajal}$: a typed, higher-order and linear programming language intended to be a minimal vehicle for exploring a direct style of programming neural networks. We prove $\textsf{Cajal}$ programs compile to linear neurons, allowing discrete algorithms to be expressed in a differentiable form compatible with gradient-based learning. With our implementation of $\textsf{Cajal}$, we conduct several experiments where we link these linear neurons against other neural networks to determine part of their function prior to learning. Linking with these neurons allows networks to learn faster, with greater data-efficiency, and in a way that's easier to debug. A key lesson is that linear programming languages provide a path towards directly programming neural networks, enabling a rich interplay between learning and the discrete structures of ordinary programming.

Related papers

Compiling to recurrent neurons [0.5249805590164902]
We present a minimal typed, higher-order and linear programming language with iteration called $textsfCajalscriptstyle(mathbbmultimap, mathbb2, mathbbN)$.<n>We prove its programs compile correctly to recurrent neurons, allowing discrete algorithms to be expressed in a differentiable form compatible with gradient-based learning.
arXiv Detail & Related papers (2025-11-18T22:26:27Z)
On the algorithmic construction of deep ReLU networks [0.0]
We take the perspective of a neural network as an algorithm.<n>In this analogy, a neural network is programmed constructively, rather than trained from data.<n>We construct and analyze several other examples, both existing and new.
arXiv Detail & Related papers (2025-06-23T20:35:52Z)
Mind The Gap: Deep Learning Doesn't Learn Deeply [16.284360949127723]
This paper aims to understand how neural networks learn algorithmic reasoning by addressing two questions.<n>How faithful are learned algorithms when they are effective, and why do neural networks fail to learn effective algorithms otherwise?
arXiv Detail & Related papers (2025-05-24T10:11:36Z)
LinSATNet: The Positive Linear Satisfiability Neural Networks [116.65291739666303]
This paper studies how to introduce the popular positive linear satisfiability to neural networks. We propose the first differentiable satisfiability layer based on an extension of the classic Sinkhorn algorithm for jointly encoding multiple sets of marginal distributions.
arXiv Detail & Related papers (2024-07-18T22:05:21Z)
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks [69.38572074372392]
We present the first results proving that feature learning occurs during training with a nonlinear model on multiple tasks. Our key insight is that multi-task pretraining induces a pseudo-contrastive loss that favors representations that align points that typically have the same label across tasks.
arXiv Detail & Related papers (2023-07-13T16:39:08Z)
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks [59.26515696183751]
We show that algorithm discovery in neural networks is sometimes more complex. We show that even simple learning problems can admit a surprising diversity of solutions.
arXiv Detail & Related papers (2023-06-30T17:59:13Z)
Towards a Neural Lambda Calculus: Neurosymbolic AI Applied to the Foundations of Functional Programming [0.0]
We will analyze the ability of neural networks to learn how to execute programs as a whole.<n>We will introduce the use of integrated neural learning and calculi formalization.
arXiv Detail & Related papers (2023-04-18T20:30:16Z)
Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x. We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z)
Neural networks with linear threshold activations: structure and algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. We also give precise bounds on the sizes of the neural networks required to represent any function in the class. We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z)
The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability. We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
On the distance between two neural networks and the stability of learning [59.62047284234815]
This paper relates parameter distance to gradient breakdown for a broad class of nonlinear compositional functions. The analysis leads to a new distance function called deep relative trust and a descent lemma for neural networks.
arXiv Detail & Related papers (2020-02-09T19:18:39Z)
Backward Feature Correction: How Deep Learning Performs Deep (Hierarchical) Learning [66.05472746340142]
This paper analyzes how multi-layer neural networks can perform hierarchical learning _efficiently_ and _automatically_ by SGD on the training objective. We establish a new principle called "backward feature correction", where the errors in the lower-level features can be automatically corrected when training together with the higher-level layers.
arXiv Detail & Related papers (2020-01-13T17:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.