The Trifecta: Three simple techniques for training deeper
Forward-Forward networks
- URL: http://arxiv.org/abs/2311.18130v2
- Date: Tue, 12 Dec 2023 13:09:46 GMT
- Title: The Trifecta: Three simple techniques for training deeper
Forward-Forward networks
- Authors: Thomas Dooms, Ing Jyh Tsang, Jose Oramas
- Abstract summary: We propose a collection of three techniques that synergize exceptionally well and drastically improve the Forward-Forward algorithm on deeper networks.
Our experiments demonstrate that our models are on par with similarly structured, backpropagation-based models in both training speed and test accuracy on simple datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern machine learning models are able to outperform humans on a variety of
non-trivial tasks. However, as the complexity of the models increases, they
consume significant amounts of power and still struggle to generalize
effectively to unseen data. Local learning, which focuses on updating subsets
of a model's parameters at a time, has emerged as a promising technique to
address these issues. Recently, a novel local learning algorithm, called
Forward-Forward, has received widespread attention due to its innovative
approach to learning. Unfortunately, its application has been limited to
smaller datasets due to scalability issues. To this end, we propose The
Trifecta, a collection of three simple techniques that synergize exceptionally
well and drastically improve the Forward-Forward algorithm on deeper networks.
Our experiments demonstrate that our models are on par with similarly
structured, backpropagation-based models in both training speed and test
accuracy on simple datasets. This is achieved by the ability to learn
representations that are informative locally, on a layer-by-layer basis, and
retain their informativeness when propagated to deeper layers in the
architecture. This leads to around 84% accuracy on CIFAR-10, a notable
improvement (25%) over the original FF algorithm. These results highlight the
potential of Forward-Forward as a genuine competitor to backpropagation and as
a promising research avenue.
Related papers
- Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Optimizing Dense Feed-Forward Neural Networks [0.0]
We propose a novel feed-forward neural network constructing method based on pruning and transfer learning.
Our approach can compress the number of parameters by more than 70%.
We also evaluate the transfer learning level comparing the refined model and the original one training from scratch a neural network.
arXiv Detail & Related papers (2023-12-16T23:23:16Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Efficiently Robustify Pre-trained Models [18.392732966487582]
robustness of large scale models towards real-world settings is still a less-explored topic.
We first benchmark the performance of these models under different perturbations and datasets.
We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks.
arXiv Detail & Related papers (2023-09-14T08:07:49Z) - Boosting Low-Data Instance Segmentation by Unsupervised Pre-training
with Saliency Prompt [103.58323875748427]
This work offers a novel unsupervised pre-training solution for low-data regimes.
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models.
Experimental results show that our method significantly boosts several QEIS models on three datasets.
arXiv Detail & Related papers (2023-02-02T15:49:03Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Benchmarking and Analyzing 3D Human Pose and Shape Estimation Beyond
Algorithms [31.2529724533643]
This work presents the first comprehensive benchmarking study from three under-explored perspectives beyond algorithms.
An analysis on 31 datasets reveals the distinct impacts of data samples.
We achieve a PA-MPJPE of 47.3 mm on the 3DPW test set with a relatively simple model.
arXiv Detail & Related papers (2022-09-21T17:39:53Z) - Learning Deep Representation with Energy-Based Self-Expressiveness for
Subspace Clustering [24.311754971064303]
We propose a new deep subspace clustering framework, motivated by the energy-based models.
Considering the powerful representation ability of the recently popular self-supervised learning, we attempt to leverage self-supervised representation learning to learn the dictionary.
arXiv Detail & Related papers (2021-10-28T11:51:08Z) - Transformer-Based Behavioral Representation Learning Enables Transfer
Learning for Mobile Sensing in Small Datasets [4.276883061502341]
We provide a neural architecture framework for mobile sensing data that can learn generalizable feature representations from time series.
This architecture combines benefits from CNN and Trans-former architectures to enable better prediction performance.
arXiv Detail & Related papers (2021-07-09T22:26:50Z) - RIFLE: Backpropagation in Depth for Deep Transfer Learning through
Re-Initializing the Fully-connected LayEr [60.07531696857743]
Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task.
We propose RIFLE - a strategy that deepens backpropagation in transfer learning settings.
RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning.
arXiv Detail & Related papers (2020-07-07T11:27:43Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.