Related papers: Exploiting Subgradient Sparsity in Max-Plus Neural Networks

Exploiting Subgradient Sparsity in Max-Plus Neural Networks

URL: http://arxiv.org/abs/2603.04133v1
Date: Wed, 04 Mar 2026 14:46:35 GMT
Title: Exploiting Subgradient Sparsity in Max-Plus Neural Networks
Authors: Ikhlas Enaieh, Olivier Fercoq,
Abstract summary: We use a novel Max-Plus neural architecture in which classical addition and multiplication are replaced with maximum and summation operations respectively.<n>Standard backpropagation fails to exploit this sparsity, leading to unnecessary computations.<n>We propose a sparse subgradient algorithm that explicitly exploits the algebraic sparsity.
Score: 1.7045044665125362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Neural Networks are powerful tools for solving machine learning problems, but their training often involves dense and costly parameter updates. In this work, we use a novel Max-Plus neural architecture in which classical addition and multiplication are replaced with maximum and summation operations respectively. This is a promising architecture in terms of interpretability, but its training is challenging. A particular feature is that this algebraic structure naturally induces sparsity in the subgradients, as only neurons that contribute to the maximum affect the loss. However, standard backpropagation fails to exploit this sparsity, leading to unnecessary computations. In this work, we focus on the minimization of the worst sample loss which transfers this sparsity to the optimization loss. To address this, we propose a sparse subgradient algorithm that explicitly exploits the algebraic sparsity. By tailoring the optimization procedure to the non-smooth nature of Max-Plus models, our method achieves more efficient updates while retaining theoretical guarantees. This highlights a principled path toward bridging algebraic structure and scalable learning.

Related papers

BLUR: A Bi-Level Optimization Approach for LLM Unlearning [100.90394814817965]
We argue that it is important to model the hierarchical structure of the unlearning problem.<n>We propose a novel algorithm, termed Bi-Level UnleaRning (textttBLUR), which delivers superior performance.
arXiv Detail & Related papers (2025-06-09T19:23:05Z)
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training [67.45211108321203]
We introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer.<n>We show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs.
arXiv Detail & Related papers (2025-06-05T16:50:23Z)
Newton Losses: Using Curvature Information for Learning with Differentiable Algorithms [80.37846867546517]
We show how to train eight different neural networks with custom objectives. We exploit their second-order information via their empirical Fisherssian matrices. We apply Loss Lossiable algorithms to achieve significant improvements for less differentiable algorithms.
arXiv Detail & Related papers (2024-10-24T18:02:11Z)
Gradient-free neural topology optimization [0.0]
gradient-free algorithms require many more iterations to converge when compared to gradient-based algorithms. This has made them unviable for topology optimization due to the high computational cost per iteration and high dimensionality of these problems. We propose a pre-trained neural reparameterization strategy that leads to at least one order of magnitude decrease in iteration count when optimizing the designs in latent space.
arXiv Detail & Related papers (2024-03-07T23:00:49Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Regularization-based Pruning of Irrelevant Weights in Deep Neural Architectures [0.0]
We propose a method for learning sparse neural topologies via a regularization technique which identifies non relevant weights and selectively shrinks their norm. We tested the proposed technique on different image classification and Natural language generation tasks, obtaining results on par or better then competitors in terms of sparsity and metrics.
arXiv Detail & Related papers (2022-04-11T09:44:16Z)
GradMax: Growing Neural Networks using Gradient Information [22.986063120002353]
We present a method that adds new neurons during training without impacting what is already learned, while improving the training dynamics. We call this technique Gradient Maximizing Growth (GradMax) and demonstrate its effectiveness in variety of vision tasks and architectures.
arXiv Detail & Related papers (2022-01-13T18:30:18Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models [15.541264326378366]
In recent years, implicit deep learning has emerged as a method to increase the depth of deep neural networks. The training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. We propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer.
arXiv Detail & Related papers (2021-06-01T15:07:34Z)
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks. It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value. It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.