Related papers: Energy-Efficient Deep Learning Without Backpropagation: A Rigorous Evaluation of Forward-Only Algorithms

Energy-Efficient Deep Learning Without Backpropagation: A Rigorous Evaluation of Forward-Only Algorithms

URL: http://arxiv.org/abs/2511.01061v1
Date: Sun, 02 Nov 2025 19:48:44 GMT
Title: Energy-Efficient Deep Learning Without Backpropagation: A Rigorous Evaluation of Forward-Only Algorithms
Authors: Przemysław Spyra, Witold Dzwinel,
Abstract summary: We present evidence that the Mono-Forward algorithm consistently surpasses optimally tuned BP baseline in classification accuracy.<n>This superior generalization is achieved with profound efficiency gains, including up to 41% less energy consumption and up to 34% faster training.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The long-held assumption that backpropagation (BP) is essential for state-of-the-art performance is challenged by this work. We present rigorous, hardware-validated evidence that the Mono-Forward (MF) algorithm, a backpropagation-free method, consistently surpasses an optimally tuned BP baseline in classification accuracy on its native Multi-Layer Perceptron (MLP) architectures. This superior generalization is achieved with profound efficiency gains, including up to 41% less energy consumption and up to 34% faster training. Our analysis, which charts an evolutionary path from Geoffrey Hinton's Forward-Forward (FF) to the Cascaded Forward (CaFo) and finally to MF, is grounded in a fair comparative framework using identical architectures and universal hyperparameter optimization. We further provide a critical re-evaluation of memory efficiency in BP-free methods, empirically demonstrating that practical overhead can offset theoretical gains. Ultimately, this work establishes MF as a practical, high-performance, and sustainable alternative to BP for MLPs.

Related papers

Unbiased Dynamic Pruning for Efficient Group-Based Policy Optimization [60.87651283510059]
Group Relative Policy Optimization (GRPO) effectively scales LLM reasoning but incurs prohibitive computational costs.<n>We propose Dynamic Pruning Policy Optimization (DPPO), a framework that enables dynamic pruning while preserving unbiased gradient estimation.<n>To mitigate the data sparsity induced by pruning, we introduce Dense Prompt Packing, a window-based greedy strategy.
arXiv Detail & Related papers (2026-03-04T14:48:53Z)
On the Limits of Test-Time Compute: Sequential Reward Filtering for Better Inference [71.09125259964684]
Test-time compute (TTC) has become an increasingly prominent paradigm for enhancing large language models (LLMs)<n>We study reward-filtered sequential inference, a simple procedure that selectively incorporates only high-reward generations into the context.<n>On the theoretical side, we show that reward-filtered sequential inference yields strictly stronger guarantees than standard TTC paradigms.
arXiv Detail & Related papers (2025-12-04T08:21:33Z)
Optimizing the Unknown: Black Box Bayesian Optimization with Energy-Based Model and Reinforcement Learning [42.508822373669936]
Black-Box Optimization (BBO) has achieved success across various scientific and engineering domains.<n>We propose the Reinforced Energy-Based Model for Bayesian Optimization (REBMBO), which integrates Gaussian Processes (GP) for local guidance with an Energy-Based Model (EBM) to capture global structural information.<n>We conduct extensive experiments on synthetic and real-world benchmarks, confirming the superior performance of REBMBO.
arXiv Detail & Related papers (2025-10-22T12:36:49Z)
Beyond Backpropagation: Exploring Innovative Algorithms for Energy-Efficient Deep Neural Network Training [0.0]
The paper rigorously investigates three BP-free training methods: the Forward-Forward (FF), Cascaded-Forward (CaFo), and Mono-Forward (MF)<n>MF reduces energy consumption by up to 41% and shortens training time by up to 34%, translating to a measurably smaller carbon footprint as estimated by CodeCarbon.
arXiv Detail & Related papers (2025-09-23T14:27:44Z)
BAPE: Learning an Explicit Bayes Classifier for Long-tailed Visual Recognition [78.70453964041718]
Current deep learning algorithms usually solve for the optimal classifier by emphimplicitly estimating the posterior probabilities.<n>This simple methodology has been proven effective for meticulously balanced academic benchmark datasets.<n>However, it is not applicable to the long-tailed data distributions in the real world.<n>This paper presents a novel approach (BAPE) that provides a more precise theoretical estimation of the data distributions.
arXiv Detail & Related papers (2025-06-29T15:12:50Z)
Minimax Optimal Reinforcement Learning with Quasi-Optimism [9.410437324336275]
We introduce EQO (Exploration via Quasi-Optimism) as a new type of reinforcement learning algorithm.<n>It avoids reliance on empirical variances and employs a simple bonus term proportional to the inverse of the state-action visit count.<n>It consistently outperforms existing algorithms in both regret performance and computational efficiency.
arXiv Detail & Related papers (2025-03-02T09:32:06Z)
Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models [14.68920095399595]
sparsity-based PEFT (SPEFT) introduces trainable sparse adaptations to the weight matrices in the model.<n>We conduct the first systematic evaluation of salience metrics for SPEFT, inspired by zero-cost NAS proxies.<n>We compare static and dynamic masking strategies, finding that static masking, which predetermines non-zero entries before training, delivers efficiency without sacrificing performance.
arXiv Detail & Related papers (2024-12-18T04:14:35Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
See Further for Parameter Efficient Fine-tuning by Standing on the Shoulders of Decomposition [56.87609859444084]
parameter-efficient fine-tuning (PEFT) focuses on optimizing a select subset of parameters while keeping the rest fixed, significantly lowering computational and storage overheads.<n>We take the first step to unify all approaches by dissecting them from a decomposition perspective.<n>We introduce two novel PEFT methods alongside a simple yet effective framework designed to enhance the performance of PEFT techniques across various applications.
arXiv Detail & Related papers (2024-07-07T15:44:42Z)
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.<n>To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.<n>Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z)
Provably Efficient Bayesian Optimization with Unknown Gaussian Process Hyperparameter Estimation [44.53678257757108]
We propose a new BO method that can sub-linearly converge to the objective function's global optimum. Our method uses a multi-armed bandit technique (EXP3) to add random data points to the BO process. We demonstrate empirically that our method outperforms existing approaches on various synthetic and real-world problems.
arXiv Detail & Related papers (2023-06-12T03:35:45Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.