Related papers: Variance-Aware Weight Initialization for Point Convolutional Neural Networks

Variance-Aware Weight Initialization for Point Convolutional Neural Networks

URL: http://arxiv.org/abs/2112.03777v1
Date: Tue, 7 Dec 2021 15:47:14 GMT
Title: Variance-Aware Weight Initialization for Point Convolutional Neural Networks
Authors: Pedro Hermosilla and Michael Schelling and Tobias Ritschel and Timo Ropinski
Abstract summary: We propose a framework to unify the multitude of continuous convolutions. We show that this framework can avoid batch normalization while achieving similar and, in some cases, better performance.
Score: 23.46612653627991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Appropriate weight initialization has been of key importance to successfully train neural networks. Recently, batch normalization has diminished the role of weight initialization by simply normalizing each layer based on batch statistics. Unfortunately, batch normalization has several drawbacks when applied to small batch sizes, as they are required to cope with memory limitations when learning on point clouds. While well-founded weight initialization strategies can render batch normalization unnecessary and thus avoid these drawbacks, no such approaches have been proposed for point convolutional networks. To fill this gap, we propose a framework to unify the multitude of continuous convolutions. This enables our main contribution, variance-aware weight initialization. We show that this initialization can avoid batch normalization while achieving similar and, in some cases, better performance.

Related papers

ConsNoTrainLoRA: Data-driven Weight Initialization of Low-rank Adapters using Constraints [64.35580479051208]
In previous works, low-rank adapters (LoRA) are randomly with a fixed rank across all attachment points.<n>In this paper, we improve convergence and final performance of LoRA fine-tuning using our proposed data-driven weight initialization method.
arXiv Detail & Related papers (2025-07-09T23:52:31Z)
Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight. We develop an improved KL metric to determine optimal quantization scales for activation. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z)
Batchless Normalization: How to Normalize Activations Across Instances with Minimal Memory Requirements [0.0]
In training neural networks, batch normalization has many benefits, not all of them entirely understood. In this paper I show a simple and straightforward way to address these issues. Among other benefits, this will hopefully contribute to the democratization of AI research by means of lowering the hardware requirements for training larger models.
arXiv Detail & Related papers (2022-12-30T14:15:54Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
ZerO Initialization: Initializing Residual Networks with only Zeros and Ones [44.66636787050788]
Deep neural networks are usually with random weights, with adequately selected initial variance to ensure stable signal propagation during training. There is no consensus on how to select the variance, and this becomes challenging as the number of layers grows. In this work, we replace the widely used random weight initialization with a fully deterministic initialization scheme ZerO, which initializes residual networks with only zeros and ones. Surprisingly, we find that ZerO achieves state-of-the-art performance over various image classification datasets, including ImageNet.
arXiv Detail & Related papers (2021-10-25T06:17:33Z)
Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks. DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons. We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z)
Fractional moment-preserving initialization schemes for training deep neural networks [1.14219428942199]
A traditional approach to deep neural networks (DNNs) is to sample the network weights randomly for preserving the variance of pre-activations. In this paper, we show that weights and therefore pre-activations can be modeled with a heavy-tailed distribution. We show through numerical experiments that our schemes can improve the training and test performance.
arXiv Detail & Related papers (2020-05-25T01:10:01Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Gradient $\ell_1$ Regularization for Quantization Robustness [70.39776106458858]
We derive a simple regularization scheme that improves robustness against post-training quantization. By training quantization-ready networks, our approach enables storing a single set of weights that can be quantized on-demand to different bit-widths.
arXiv Detail & Related papers (2020-02-18T12:31:34Z)
Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.