Related papers: QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

URL: http://arxiv.org/abs/2405.03192v2
Date: Thu, 9 May 2024 02:20:42 GMT
Title: QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation
Authors: Chenhui Xu, Xinyao Wang, Fuxun Yu, Jinjun Xiong, Xiang Chen,
Abstract summary: We introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90% to 98.4% compared to training from scratch, demonstrating both efficiency and effectiveness.
Score: 25.003305443114296
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framework, QuadraNet V2, which leverages quadratic neural networks to create efficient and sustainable high-order learning models. Our method initializes the primary term of the quadratic neuron using a standard neural network, while the quadratic term is employed to adaptively enhance the learning of data non-linearity or shifts. This integration of pre-trained primary terms with quadratic terms, which possess advanced modeling capabilities, significantly augments the information characterization capacity of the high-order network. By utilizing existing pre-trained weights, QuadraNet V2 reduces the required GPU hours for training by 90\% to 98.4\% compared to training from scratch, demonstrating both efficiency and effectiveness.

Related papers

Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network) After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference. We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z)
An exactly solvable model for emergence and scaling laws in the multitask sparse parity problem [2.598133279943607]
We present a framework where each new ability (a skill) is represented as a basis function. We find analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.
arXiv Detail & Related papers (2024-04-26T17:45:32Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
Dr$^2$Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning [81.0108753452546]
We propose Dynamic Reversible Dual-Residual Networks, or Dr$2$Net, to finetune a pretrained model with substantially reduced memory consumption. Dr$2$Net contains two types of residual connections, one maintaining the residual structure in the pretrained models, and the other making the network reversible. We show that Dr$2$Net can reach comparable performance to conventional finetuning but with significantly less memory usage.
arXiv Detail & Related papers (2024-01-08T18:59:31Z)
Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models [17.34908967455907]
machine unlearning'' proposes the selective removal of unwanted data without the need for retraining from scratch. Fast-NTK is a novel NTK-based unlearning algorithm that significantly reduces the computational complexity.
arXiv Detail & Related papers (2023-12-22T18:55:45Z)
Learning-based adaption of robotic friction models [50.72489248401199]
We introduce a novel approach to adapt an existing friction model to new dynamics using as little data as possible.<n>Our method does not rely on data with external load during training, eliminating the need for external torque sensors.
arXiv Detail & Related papers (2023-10-25T14:50:15Z)
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox [71.63186089279218]
This paper introduces a pre-trained model-based continual learning toolbox known as PILOT. On the one hand, PILOT implements some state-of-the-art class-incremental learning algorithms based on pre-trained models, such as L2P, DualPrompt, and CODA-Prompt. On the other hand, PILOT fits typical class-incremental learning algorithms within the context of pre-trained models to evaluate their effectiveness.
arXiv Detail & Related papers (2023-09-13T17:55:11Z)
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes [100.69714600180895]
offline Q-learning algorithms exhibit strong performance that scales with model capacity. We train a single policy on 40 games with near-human performance using up-to 80 million parameter networks. Compared to return-conditioned supervised approaches, offline Q-learning scales similarly with model capacity and has better performance, especially when the dataset is suboptimal.
arXiv Detail & Related papers (2022-11-28T08:56:42Z)
Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability [53.27240222619834]
Knowledge Distillation as Efficient Pre-training aims to efficiently transfer the learned feature representation from pre-trained models to new student models for future downstream tasks. Our method performs comparably with supervised pre-training counterparts in 3 downstream tasks and 9 downstream datasets requiring 10x less data and 5x less pre-training time.
arXiv Detail & Related papers (2022-03-10T06:23:41Z)
On Expressivity and Trainability of Quadratic Networks [12.878230964137014]
quadratic artificial neurons can play an important role in deep learning models. We show that the superior expressivity of a quadratic network over either a conventional network or a conventional network via quadratic activation is not fully elucidated. We propose an effective training strategy referred to as ReLinear to stabilize the training process of a quadratic network.
arXiv Detail & Related papers (2021-10-12T15:33:32Z)
Performance of Transfer Learning Model vs. Traditional Neural Network in Low System Resource Environment [0.0]
We will compare the performance and cost between lighter transfer learning model and purposely built neural network for NLP application of text classification and NER model. The rise of state-of-the-art models such as BERT, XLNet and GPT boost accuracy and benefit as a base model for transfer leanring.
arXiv Detail & Related papers (2020-10-20T08:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.