A projection-based framework for gradient-free and parallel learning
- URL: http://arxiv.org/abs/2506.05878v1
- Date: Fri, 06 Jun 2025 08:44:56 GMT
- Title: A projection-based framework for gradient-free and parallel learning
- Authors: Andreas Bergmeister, Manish Krishan Lal, Stefanie Jegelka, Suvrit Sra,
- Abstract summary: We introduce PJAX, a JAX-based software framework that enables this paradigm.<n>PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems.<n>We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its generality.
- Score: 50.96641619247761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a feasibility-seeking approach to neural network training. This mathematical optimization framework is distinct from conventional gradient-based loss minimization and uses projection operators and iterative projection algorithms. We reformulate training as a large-scale feasibility problem: finding network parameters and states that satisfy local constraints derived from its elementary operations. Training then involves projecting onto these constraints, a local operation that can be parallelized across the network. We introduce PJAX, a JAX-based software framework that enables this paradigm. PJAX composes projection operators for elementary operations, automatically deriving the solution operators for the feasibility problems (akin to autodiff for derivatives). It inherently supports GPU/TPU acceleration, provides a familiar NumPy-like API, and is extensible. We train diverse architectures (MLPs, CNNs, RNNs) on standard benchmarks using PJAX, demonstrating its functionality and generality. Our results show that this approach is as a compelling alternative to gradient-based training, with clear advantages in parallelism and the ability to handle non-differentiable operations.
Related papers
- Self-Contrastive Forward-Forward Algorithm [3.1361717406527667]
Forward-Forward (FF) algorithm relies on feedforward operations to optimize layer-wise objectives.<n>FF has failed to reach state-of-the-art performance on most standard benchmark tasks.<n>We propose Self-Contrastive Forward-Forward (SCFF) algorithm, a competitive training method aimed at closing this performance gap.
arXiv Detail & Related papers (2024-09-17T22:58:20Z) - Slax: A Composable JAX Library for Rapid and Flexible Prototyping of Spiking Neural Networks [0.19427883580687189]
We introduce Slax, a JAX-based library designed to accelerate SNN algorithm design.
Slax provides optimized implementations of diverse training algorithms, allowing direct performance comparison.
arXiv Detail & Related papers (2024-04-08T18:15:13Z) - A foundation for exact binarized morphological neural networks [2.8925699537310137]
Training and running deep neural networks (NNs) often demands a lot of computation and energy-intensive specialized hardware.
One way to reduce the computation and power cost is to use binary weight NNs, but these are hard to train because the sign function has a non-smooth gradient.
We present a model based on Mathematical Morphology (MM), which can binarize ConvNets without losing performance under certain conditions.
arXiv Detail & Related papers (2024-01-08T11:37:44Z) - Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions to solving a given task.<n>Our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z) - When Computing Power Network Meets Distributed Machine Learning: An
Efficient Federated Split Learning Framework [6.871107511111629]
CPN-FedSL is a Federated Split Learning (FedSL) framework over Computing Power Network (CPN)
We build a dedicated model to capture the basic settings and learning characteristics (e.g., latency, flow, convergence)
arXiv Detail & Related papers (2023-05-22T12:36:52Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Proxy Convexity: A Unified Framework for the Analysis of Neural Networks
Trained by Gradient Descent [95.94432031144716]
We propose a unified non- optimization framework for the analysis of a learning network.
We show that existing guarantees can be trained unified through gradient descent.
arXiv Detail & Related papers (2021-06-25T17:45:00Z) - Relative gradient optimization of the Jacobian term in unsupervised deep
learning [9.385902422987677]
Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning.
Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian.
We propose a new approach for exact training of such neural networks.
arXiv Detail & Related papers (2020-06-26T16:41:08Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.