TrAct: Making First-layer Pre-Activations Trainable
- URL: http://arxiv.org/abs/2410.23970v1
- Date: Thu, 31 Oct 2024 14:25:55 GMT
- Title: TrAct: Making First-layer Pre-Activations Trainable
- Authors: Felix Petersen, Christian Borgelt, Stefano Ermon,
- Abstract summary: We consider the training of the first layer of vision models and notice the clear relationship between pixel values and update magnitudes.
An image with low contrast has a smaller impact on learning than an image with higher contrast.
A very bright or very dark image has a stronger impact on the weights than an image with moderate brightness.
- Score: 65.40281259525578
- License:
- Abstract: We consider the training of the first layer of vision models and notice the clear relationship between pixel values and gradient update magnitudes: the gradients arriving at the weights of a first layer are by definition directly proportional to (normalized) input pixel values. Thus, an image with low contrast has a smaller impact on learning than an image with higher contrast, and a very bright or very dark image has a stronger impact on the weights than an image with moderate brightness. In this work, we propose performing gradient descent on the embeddings produced by the first layer of the model. However, switching to discrete inputs with an embedding layer is not a reasonable option for vision models. Thus, we propose the conceptual procedure of (i) a gradient descent step on first layer activations to construct an activation proposal, and (ii) finding the optimal weights of the first layer, i.e., those weights which minimize the squared distance to the activation proposal. We provide a closed form solution of the procedure and adjust it for robust stochastic training while computing everything efficiently. Empirically, we find that TrAct (Training Activations) speeds up training by factors between 1.25x and 4x while requiring only a small computational overhead. We demonstrate the utility of TrAct with different optimizers for a range of different vision models including convolutional and transformer architectures.
Related papers
- Efficient Training with Denoised Neural Weights [65.14892033932895]
This work takes a novel step towards building a weight generator to synthesize the neural weights for initialization.
We use the image-to-image translation task with generative adversarial networks (GANs) as an example due to the ease of collecting model weights.
By initializing the image translation model with the denoised weights predicted by our diffusion model, the training requires only 43.3 seconds.
arXiv Detail & Related papers (2024-07-16T17:59:42Z) - FastMIM: Expediting Masked Image Modeling Pre-training for Vision [65.47756720190155]
FastMIM is a framework for pre-training vision backbones with low-resolution input images.
It reconstructs Histograms of Oriented Gradients (HOG) feature instead of original RGB values of the input images.
It can achieve 83.8%/84.1% top-1 accuracy on ImageNet-1K with ViT-B/Swin-B as backbones.
arXiv Detail & Related papers (2022-12-13T14:09:32Z) - Compact Model Training by Low-Rank Projection with Energy Transfer [13.446719541044663]
Low-rankness plays an important role in traditional machine learning, but is not so popular in deep learning.
Previous low-rank network compression methods compress networks by approximating pre-trained models and re-training.
We devise a new training method, low-rank projection with energy transfer, that trains low-rank compressed networks from scratch and competitive performance.
arXiv Detail & Related papers (2022-04-12T06:53:25Z) - Decoupled Low-light Image Enhancement [21.111831640136835]
We propose to decouple the enhancement model into two sequential stages.
The first stage focuses on improving the scene visibility based on a pixel-wise non-linear mapping.
The second stage focuses on improving the appearance fidelity by suppressing the rest degeneration factors.
arXiv Detail & Related papers (2021-11-29T11:15:38Z) - On Training Implicit Models [75.20173180996501]
We propose a novel gradient estimate for implicit models, named phantom gradient, that forgoes the costly computation of the exact gradient.
Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times.
arXiv Detail & Related papers (2021-11-09T14:40:24Z) - Powerpropagation: A sparsity inducing weight reparameterisation [65.85142037667065]
We introduce Powerpropagation, a new weight- parameterisation for neural networks that leads to inherently sparse models.
Models trained in this manner exhibit similar performance, but have a distribution with markedly higher density at zero, allowing more parameters to be pruned safely.
Here, we combine Powerpropagation with a traditional weight-pruning technique as well as recent state-of-the-art sparse-to-sparse algorithms, showing superior performance on the ImageNet benchmark.
arXiv Detail & Related papers (2021-10-01T10:03:57Z) - Partial transfusion: on the expressive influence of trainable batch norm
parameters for transfer learning [0.0]
Transfer learning from ImageNet is the go-to approach when applying deep learning to medical images.
Most modern architecture contain batch normalisation layers, and fine-tuning a model with such layers requires taking a few precautions.
We find that only fine-tuning the trainable weights of the batch normalisation layers leads to similar performance as to fine-tuning all of the weights.
arXiv Detail & Related papers (2021-02-10T16:29:03Z) - Powers of layers for image-to-image translation [60.5529622990682]
We propose a simple architecture to address unpaired image-to-image translation tasks.
We start from an image autoencoder architecture with fixed weights.
For each task we learn a residual block operating in the latent space, which is iteratively called until the target domain is reached.
arXiv Detail & Related papers (2020-08-13T09:02:17Z) - A Two-step-training Deep Learning Framework for Real-time Computational
Imaging without Physics Priors [0.0]
We propose a two-step-training DL (TST-DL) framework for real-time computational imaging without physics priors.
First, a single fully-connected layer (FCL) is trained to directly learn the model.
Then, this FCL is fixed and fixed with an un-trained U-Net architecture for a second-step training to improve the output image fidelity.
arXiv Detail & Related papers (2020-01-10T15:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.