Spectrum Extraction and Clipping for Implicitly Linear Layers
- URL: http://arxiv.org/abs/2402.16017v2
- Date: Mon, 07 Oct 2024 12:52:19 GMT
- Title: Spectrum Extraction and Clipping for Implicitly Linear Layers
- Authors: Ali Ebrahimpour Boroojeny, Matus Telgarsky, Hari Sundaram,
- Abstract summary: We show the effectiveness of automatic differentiation in efficiently and correctly computing and controlling the spectrum of implicitly linear operators.
We provide the first clipping method which is correct for general convolution layers.
- Score: 20.277446818410997
- License:
- Abstract: We show the effectiveness of automatic differentiation in efficiently and correctly computing and controlling the spectrum of implicitly linear operators, a rich family of layer types including all standard convolutional and dense layers. We provide the first clipping method which is correct for general convolution layers, and illuminate the representational limitation that caused correctness issues in prior work. We study the effect of the batch normalization layers when concatenated with convolutional layers and show how our clipping method can be applied to their composition. By comparing the accuracy and performance of our algorithms to the state-of-the-art methods, using various experiments, we show they are more precise and efficient and lead to better generalization and adversarial robustness. We provide the code for using our methods at https://github.com/Ali-E/FastClip.
Related papers
- Spectral Norm of Convolutional Layers with Circular and Zero Paddings [55.233197272316275]
We generalize the use of the Gram iteration to zero padding convolutional layers and prove its quadratic convergence.
We also provide theorems for bridging the gap between circular and zero padding convolution's spectral norm.
arXiv Detail & Related papers (2024-01-31T23:48:48Z) - GloptiNets: Scalable Non-Convex Optimization with Certificates [61.50835040805378]
We present a novel approach to non-cube optimization with certificates, which handles smooth functions on the hypercube or on the torus.
By exploiting the regularity of the target function intrinsic in the decay of its spectrum, we allow at the same time to obtain precise certificates and leverage the advanced and powerful neural networks.
arXiv Detail & Related papers (2023-06-26T09:42:59Z) - Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram
Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory.
We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability.
It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z) - Linearization Algorithms for Fully Composite Optimization [61.20539085730636]
This paper studies first-order algorithms for solving fully composite optimization problems convex compact sets.
We leverage the structure of the objective by handling differentiable and non-differentiable separately, linearizing only the smooth parts.
arXiv Detail & Related papers (2023-02-24T18:41:48Z) - Improving Generalization of Batch Whitening by Convolutional Unit
Optimization [24.102442375834084]
Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling)
In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function.
We propose a new Convolutional Unit that is in line with the theory, and our method generally improves the performance of Batch Whitening.
arXiv Detail & Related papers (2021-08-24T10:27:57Z) - Orthogonalizing Convolutional Layers with the Cayley Transform [83.73855414030646]
We propose and evaluate an alternative approach to parameterize convolutional layers that are constrained to be orthogonal.
We show that our method indeed preserves orthogonality to a high degree even for large convolutions.
arXiv Detail & Related papers (2021-04-14T23:54:55Z) - Data-efficient Alignment of Multimodal Sequences by Aligning Gradient
Updates and Internal Feature Distributions [36.82512331179322]
Recent research suggests that network components dealing with different modalities may overfit and generalize at different speeds, creating difficulties for training.
We propose layer-wise adaptive rate scaling (LARS) to align the magnitudes of gradient updates in different layers and balance the pace of learning.
We also use sequence-wise batch normalization (SBN) to align the internal feature distributions from different modalities.
arXiv Detail & Related papers (2020-11-15T13:04:25Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - A block coordinate descent optimizer for classification problems
exploiting convexity [0.0]
We introduce a coordinate descent method to deep linear networks for classification tasks that exploits convexity of the cross-entropy loss in the weights of the hidden layer.
By alternating between a second-order method to find globally optimal parameters for the linear layer and gradient descent to the hidden layers, we ensure an optimal fit of the adaptive basis to data throughout training.
arXiv Detail & Related papers (2020-06-17T19:49:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.