GLAI: GreenLightningAI for Accelerated Training through Knowledge Decoupling
- URL: http://arxiv.org/abs/2510.00883v1
- Date: Wed, 01 Oct 2025 13:31:34 GMT
- Title: GLAI: GreenLightningAI for Accelerated Training through Knowledge Decoupling
- Authors: Jose I. Mestre, Alberto Fernández-Hernández, Cristian Pérez-Corral, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí,
- Abstract summary: We introduce GreenLightningAI (GLAI), a new architectural block designed as an alternative to conventionals.<n>The central idea is to separate two types of knowledge that are usually entangled during training: (i) *structural knowledge*, encoded by the stable activation patterns induced by ReLU activations; and (ii) *quantitative knowledge*, carried by the numerical weights and biases.<n>By fixing the structure once stabilized, GLAI reformulates the structure as a combination of paths, where only the quantitative component is optimized.
- Score: 1.0518862318418603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we introduce GreenLightningAI (GLAI), a new architectural block designed as an alternative to conventional MLPs. The central idea is to separate two types of knowledge that are usually entangled during training: (i) *structural knowledge*, encoded by the stable activation patterns induced by ReLU activations; and (ii) *quantitative knowledge*, carried by the numerical weights and biases. By fixing the structure once stabilized, GLAI reformulates the MLP as a combination of paths, where only the quantitative component is optimized. This reformulation retains the universal approximation capabilities of MLPs, yet achieves a more efficient training process, reducing training time by ~40% on average across the cases examined in this study. Crucially, GLAI is not just another classifier, but a generic block that can replace MLPs wherever they are used, from supervised heads with frozen backbones to projection layers in self-supervised learning or few-shot classifiers. Across diverse experimental setups, GLAI consistently matches or exceeds the accuracy of MLPs with an equivalent number of parameters, while converging faster. Overall, GLAI establishes a new design principle that opens a direction for future integration into large-scale architectures such as Transformers, where MLP blocks dominate the computational footprint.
Related papers
- ProGMLP: A Progressive Framework for GNN-to-MLP Knowledge Distillation with Efficient Trade-offs [42.37669895235534]
We introduce a Progressive framework designed to offer flexible and on-demand trade-offs between inference cost and accuracy for GNN-to-MLP knowledge.<n>Our approach is validated through comprehensive experiments on eight real-world graph datasets.
arXiv Detail & Related papers (2025-07-25T07:35:09Z) - TreeLoRA: Efficient Continual Learning via Layer-Wise LoRAs Guided by a Hierarchical Gradient-Similarity Tree [52.44403214958304]
In this paper, we introduce TreeLoRA, a novel approach that constructs layer-wise adapters by leveraging hierarchical gradient similarity.<n>To reduce the computational burden of task similarity estimation, we employ bandit techniques to develop an algorithm based on lower confidence bounds.<n> experiments on both vision transformers (ViTs) and large language models (LLMs) demonstrate the effectiveness and efficiency of our approach.
arXiv Detail & Related papers (2025-06-12T05:25:35Z) - Leveraging KANs for Expedient Training of Multichannel MLPs via Preconditioning and Geometric Refinement [2.249916681499244]
Multilayer perceptrons (MLPs) are a workhorse machine learning architecture, used in a variety of modern deep learning frameworks.<n>Recently Kolmogorov-Arnold Networks (KANs) have become increasingly popular due to their success on a range of problems.<n>In this paper, we exploit the relationship between KANs and multichannels to gain structural insight into how to trains faster.
arXiv Detail & Related papers (2025-05-23T17:41:18Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - MLPs Learn In-Context on Regression and Classification Tasks [28.13046236900491]
In-context learning (ICL) is often assumed to be a unique hallmark of Transformer models.<n>We demonstrate that multi-layer perceptrons (MLPs) can also learn in-context.<n>Results highlight the unexpected competence of exemplars in a synthetic setting.
arXiv Detail & Related papers (2024-05-24T15:04:36Z) - PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation [65.268245109828]
We introduce PRILoRA, which linearly allocates a different rank for each layer, in an increasing manner, and performs pruning throughout the training process.
We validate the effectiveness of PRILoRA through extensive experiments on eight GLUE benchmarks, setting a new state of the art.
arXiv Detail & Related papers (2024-01-20T20:25:17Z) - MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models [33.86069537521178]
Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications.<n>General approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning.<n>We propose one-shot compression techniques specifically designed for fine-tuning.
arXiv Detail & Related papers (2023-07-18T03:12:51Z) - Lifelong Machine Learning Potentials [0.0]
We introduce element-embracing atom-centered functions (eeACSFs) which combine structural properties and element information from the periodic table.
We apply continual learning strategies to enable autonomous and on-the-fly training on a continuous stream of new data.
arXiv Detail & Related papers (2023-03-10T13:38:36Z) - Efficient Language Modeling with Sparse all-MLP [53.81435968051093]
All-MLPs can match Transformers in language modeling, but still lag behind in downstream tasks.
We propose sparse all-MLPs with mixture-of-experts (MoEs) in both feature and input (tokens)
We evaluate its zero-shot in-context learning performance on six downstream tasks, and find that it surpasses Transformer-based MoEs and dense Transformers.
arXiv Detail & Related papers (2022-03-14T04:32:19Z) - Learning with Multiclass AUC: Theory and Algorithms [141.63211412386283]
Area under the ROC curve (AUC) is a well-known ranking metric for problems such as imbalanced learning and recommender systems.
In this paper, we start an early trial to consider the problem of learning multiclass scoring functions via optimizing multiclass AUC metrics.
arXiv Detail & Related papers (2021-07-28T05:18:10Z) - GradInit: Learning to Initialize Neural Networks for Stable and
Efficient Training [59.160154997555956]
We present GradInit, an automated and architecture method for initializing neural networks.
It is based on a simple agnostic; the variance of each network layer is adjusted so that a single step of SGD or Adam results in the smallest possible loss value.
It also enables training the original Post-LN Transformer for machine translation without learning rate warmup.
arXiv Detail & Related papers (2021-02-16T11:45:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.