AutoClip: Adaptive Gradient Clipping for Source Separation Networks
- URL: http://arxiv.org/abs/2007.14469v1
- Date: Sat, 25 Jul 2020 20:59:39 GMT
- Title: AutoClip: Adaptive Gradient Clipping for Source Separation Networks
- Authors: Prem Seetharaman, Gordon Wichern, Bryan Pardo, Jonathan Le Roux
- Abstract summary: AutoClip is a method for automatically and adaptively choosing a gradient clipping threshold.
Experiments show that applying AutoClip results in improved performance for audio source separation networks.
- Score: 45.58157519349822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clipping the gradient is a known approach to improving gradient descent, but
requires hand selection of a clipping threshold hyperparameter. We present
AutoClip, a simple method for automatically and adaptively choosing a gradient
clipping threshold, based on the history of gradient norms observed during
training. Experimental results show that applying AutoClip results in improved
generalization performance for audio source separation networks. Observation of
the training dynamics of a separation network trained with and without AutoClip
show that AutoClip guides optimization into smoother parts of the loss
landscape. AutoClip is very simple to implement and can be integrated readily
into a variety of applications across multiple domains.
Related papers
- Stepping Forward on the Last Mile [8.756033984943178]
We propose a series of algorithm enhancements that further reduce the memory footprint, and the accuracy gap compared to backpropagation.
Our results demonstrate that on the last mile of model customization on edge devices, training with fixed-point forward gradients is a feasible and practical approach.
arXiv Detail & Related papers (2024-11-06T16:33:21Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping [27.547461769425855]
Per-core clip-ping (PCC) can effectively mitigate unintended memorization in ASR models.
PCC positively influences ASR performance metrics, leading to improved convergence rates and reduced word error rates.
arXiv Detail & Related papers (2024-06-04T06:34:33Z) - Exploring the Limits of Differentially Private Deep Learning with
Group-wise Clipping [91.60608388479645]
We show that emphper-layer clipping allows clipping to be performed in conjunction with backpropagation in differentially private optimization.
This results in private learning that is as memory-efficient and almost as fast per training update as non-private learning for many of interest.
arXiv Detail & Related papers (2022-12-03T05:20:15Z) - Spatio-Temporal Crop Aggregation for Video Representation Learning [33.296154476701055]
Our model builds long-range video features by learning from sets of video clip-level features extracted with a pre-trained backbone.
We demonstrate that our video representation yields state-of-the-art performance with linear, non-linear, and $k$-NN probing on common action classification datasets.
arXiv Detail & Related papers (2022-11-30T14:43:35Z) - Beyond Short Clips: End-to-End Video-Level Learning with Collaborative
Memories [56.91664227337115]
We introduce a collaborative memory mechanism that encodes information across multiple sampled clips of a video at each training iteration.
This enables the learning of long-range dependencies beyond a single clip.
Our proposed framework is end-to-end trainable and significantly improves the accuracy of video classification at a negligible computational overhead.
arXiv Detail & Related papers (2021-04-02T18:59:09Z) - Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation [95.31590177308482]
We propose an automated multi-loss adaptation (named Ada-Segment) to flexibly adjust multiple training losses over the course of training.
With an end-to-end architecture, Ada-Segment generalizes to different datasets without the need of re-tuning hyper parameters.
Ada-Segment brings 2.7% panoptic quality (PQ) improvement on COCO val split from the vanilla baseline, achieving the state-of-the-art 48.5% PQ on COCO test-dev split and 32.9% PQ on ADE20K dataset.
arXiv Detail & Related papers (2020-12-07T11:43:10Z) - Self-Tuning Stochastic Optimization with Curvature-Aware Gradient
Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics.
We prove that our model-based procedure converges in noisy gradient setting.
This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z) - Dynamic Sampling Networks for Efficient Action Recognition in Videos [43.51012099839094]
We propose a new framework for action recognition in videos, called em Dynamic Sampling Networks (DSN)
DSN is composed of a sampling module and a classification module, whose objective is to learn a sampling policy to on-the-fly select which clips to keep and train a clip-level classifier to perform action recognition based on these selected clips, respectively.
We study different aspects of the DSN framework on four action recognition datasets: UCF101, HMDB51, THUMOS14, and ActivityNet v1.3.
arXiv Detail & Related papers (2020-06-28T09:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.