Feed-Forward On-Edge Fine-tuning Using Static Synthetic Gradient Modules
- URL: http://arxiv.org/abs/2009.09675v1
- Date: Mon, 21 Sep 2020 08:27:01 GMT
- Title: Feed-Forward On-Edge Fine-tuning Using Static Synthetic Gradient Modules
- Authors: Robby Neven, Marian Verhelst, Tinne Tuytelaars and Toon Goedem\'e
- Abstract summary: Training deep learning models on embedded devices is typically avoided since this requires more memory, computation and power over inference.
In this work, we focus on lowering the amount of memory needed for storing all activations, which are required during the backward pass to compute the gradients.
We have shown that our method has comparable results to using standard backpropagation.
- Score: 35.92284329679786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training deep learning models on embedded devices is typically avoided since
this requires more memory, computation and power over inference. In this work,
we focus on lowering the amount of memory needed for storing all activations,
which are required during the backward pass to compute the gradients. Instead,
during the forward pass, static Synthetic Gradient Modules (SGMs) predict
gradients for each layer. This allows training the model in a feed-forward
manner without having to store all activations. We tested our method on a robot
grasping scenario where a robot needs to learn to grasp new objects given only
a single demonstration. By first training the SGMs in a meta-learning manner on
a set of common objects, during fine-tuning, the SGMs provided the model with
accurate gradients to successfully learn to grasp new objects. We have shown
that our method has comparable results to using standard backpropagation.
Related papers
- Can Gradient Descent Simulate Prompting? [56.60154660021178]
gradient updates the effects of conditioning on new information.<n> gradient descent training recovers some (and occasionally all) of prompted model performance.<n>Results suggest new avenues for long-context modeling.
arXiv Detail & Related papers (2025-06-26T04:06:20Z) - Machine Unlearning under Overparameterization [35.031020618251965]
Machine unlearning algorithms aim to remove the influence of specific samples, ideally recovering the model that would have resulted from the remaining data alone.<n>We unlearning in a training overolate setting, where many models interpolate and retain data.<n>We provide exact and approximate classes, and we demonstrate our framework across various unlearning experiments.
arXiv Detail & Related papers (2025-05-28T17:14:57Z) - Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation.
Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z) - Unified Gradient-Based Machine Unlearning with Remain Geometry Enhancement [29.675650285351768]
Machine unlearning (MU) has emerged to enhance the privacy and trustworthiness of deep neural networks.
Approximate MU is a practical method for large-scale models.
We propose a fast-slow parameter update strategy to implicitly approximate the up-to-date salient unlearning direction.
arXiv Detail & Related papers (2024-09-29T15:17:33Z) - Targeted Unlearning with Single Layer Unlearning Gradient [15.374381635334897]
Machine unlearning methods aim to remove sensitive or unwanted content from trained models.<n>We propose Single Layer Unlearning Gradient computation (SLUG) as an efficient method to unlearn targeted information.
arXiv Detail & Related papers (2024-07-16T15:52:36Z) - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach [87.8330887605381]
We show how to adapt a pre-trained Vision Transformer to downstream recognition tasks with only a few learnable parameters.
We synthesize a task-specific query with a learnable and lightweight module, which is independent of the pre-trained model.
Our method achieves state-of-the-art performance under memory constraints, showcasing its applicability in real-world situations.
arXiv Detail & Related papers (2024-07-09T15:45:04Z) - An Information Theoretic Approach to Machine Unlearning [43.423418819707784]
To comply with AI and data regulations, the need to forget private or copyrighted information from trained machine learning models is increasingly important.
In this work, we address the zero-shot unlearning scenario, whereby an unlearning algorithm must be able to remove data given only a trained model and the data to be forgotten.
We derive a simple but principled zero-shot unlearning method based on the geometry of the model.
arXiv Detail & Related papers (2024-02-02T13:33:30Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - EMO: Episodic Memory Optimization for Few-Shot Meta-Learning [69.50380510879697]
episodic memory optimization for meta-learning, we call EMO, is inspired by the human ability to recall past learning experiences from the brain's memory.
EMO nudges parameter updates in the right direction, even when the gradients provided by a limited number of examples are uninformative.
EMO scales well with most few-shot classification benchmarks and improves the performance of optimization-based meta-learning methods.
arXiv Detail & Related papers (2023-06-08T13:39:08Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Selectively Hard Negative Mining for Alleviating Gradient Vanishing in
Image-Text Matching [15.565068934153983]
Most existing Image-Text Matching (ITM) models suffer from gradients vanishing at the beginning of training.
We propose a Selectively Hard Negative Mining (SelHN) strategy, which chooses whether to mine hard negative samples.
SelHN can be plug-and-play applied to existing ITM models to give them better training behavior.
arXiv Detail & Related papers (2023-03-01T02:15:07Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.