SDGMNet: Statistic-based Dynamic Gradient Modulation for Local
Descriptor Learning
- URL: http://arxiv.org/abs/2106.04434v2
- Date: Wed, 9 Jun 2021 12:45:28 GMT
- Title: SDGMNet: Statistic-based Dynamic Gradient Modulation for Local
Descriptor Learning
- Authors: Jiayi Ma and Yuxin Deng
- Abstract summary: We propose a dynamic gradient modulation, named SDGMNet, to improve triplet loss for local descriptor learning.
In this paper, we perform deep analysis on back propagation of general triplet-based loss and introduce included angle for distance measure.
Our novel descriptor surpasses previous state-of-the-arts on standard benchmarks including patch verification, matching and retrieval tasks.
- Score: 44.69439245287881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modifications on triplet loss that rescale the back-propagated gradients of
special pairs have made significant progress on local descriptor learning.
However, current gradient modulation strategies are mainly static so that they
would suffer from changes of training phases or datasets. In this paper, we
propose a dynamic gradient modulation, named SDGMNet, to improve triplet loss
for local descriptor learning. The core of our method is formulating modulation
functions with statistical characteristics which are estimated dynamically.
Firstly, we perform deep analysis on back propagation of general triplet-based
loss and introduce included angle for distance measure. On this basis,
auto-focus modulation is employed to moderate the impact of statistically
uncommon individual pairs in stochastic gradient descent optimization;
probabilistic margin cuts off the gradients of proportional Siamese pairs that
are believed to reach the optimum; power adjustment balances the total weights
of negative pairs and positive pairs. Extensive experiments demonstrate that
our novel descriptor surpasses previous state-of-the-arts on standard
benchmarks including patch verification, matching and retrieval tasks.
Related papers
- PreAdaptFWI: Pretrained-Based Adaptive Residual Learning for Full-Waveform Inversion Without Dataset Dependency [8.719356558714246]
Full-waveform inversion (FWI) is a method that utilizes seismic data to invert the physical parameters of subsurface media.
Due to its ill-posed nature, FWI is susceptible to getting trapped in local minima.
Various research efforts have attempted to combine neural networks with FWI to stabilize the inversion process.
arXiv Detail & Related papers (2025-02-17T15:30:17Z) - Signal Processing Meets SGD: From Momentum to Filter [6.751292200515355]
In deep learning, gradient descent (SGD) and its momentum-based variants are widely used for optimization.
In this paper, we analyze gradient behavior through a signal processing lens, isolating key factors that influence updates.
We introduce a novel method SGDF based on Wiener Filter principles, which derives an optimal time-varying gain to refine updates.
arXiv Detail & Related papers (2023-11-06T01:41:46Z) - Domain Generalization Guided by Gradient Signal to Noise Ratio of
Parameters [69.24377241408851]
Overfitting to the source domain is a common issue in gradient-based training of deep neural networks.
We propose to base the selection on gradient-signal-to-noise ratio (GSNR) of network's parameters.
arXiv Detail & Related papers (2023-10-11T10:21:34Z) - Understanding the robustness difference between stochastic gradient
descent and adaptive gradient methods [11.895321856533934]
gradient descent (SGD) and adaptive gradient methods have been widely used in training deep neural networks.
We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations.
arXiv Detail & Related papers (2023-08-13T07:03:22Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem.
We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory.
Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z) - A Random Matrix Theory Approach to Damping in Deep Learning [0.7614628596146599]
We conjecture that the inherent difference in generalisation between adaptive and non-adaptive gradient methods in deep learning stems from the increased estimation noise.
We develop a novel random matrix theory based damping learner for second order optimiser inspired by linear shrinkage estimation.
arXiv Detail & Related papers (2020-11-15T18:19:42Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.