Decoupled Kullback-Leibler Divergence Loss
- URL: http://arxiv.org/abs/2305.13948v1
- Date: Tue, 23 May 2023 11:17:45 GMT
- Title: Decoupled Kullback-Leibler Divergence Loss
- Authors: Jiequan Cui, Zhuotao Tian, Zhisheng Zhong, Xiaojuan Qi, Bei Yu,
Hanwang Zhang
- Abstract summary: Kullback-Leibler (KL) Divergence loss is equivalent to the Doupled Kullback-Leibler (DKL) Divergence loss.
We introduce global information into DKL for intra-class consistency regularization.
The proposed approach achieves new state-of-the-art performance on both tasks, demonstrating the substantial practical merits.
- Score: 75.31157286595517
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss
and observe that it is equivalent to the Doupled Kullback-Leibler (DKL)
Divergence loss that consists of 1) a weighted Mean Square Error (wMSE) loss
and 2) a Cross-Entropy loss incorporating soft labels. From our analysis of the
DKL loss, we have identified two areas for improvement. Firstly, we address the
limitation of DKL in scenarios like knowledge distillation by breaking its
asymmetry property in training optimization. This modification ensures that the
wMSE component is always effective during training, providing extra
constructive cues. Secondly, we introduce global information into DKL for
intra-class consistency regularization. With these two enhancements, we derive
the Improved Kullback-Leibler (IKL) Divergence loss and evaluate its
effectiveness by conducting experiments on CIFAR-10/100 and ImageNet datasets,
focusing on adversarial training and knowledge distillation tasks. The proposed
approach achieves new state-of-the-art performance on both tasks, demonstrating
the substantial practical merits. Code and models will be available soon at
https://github.com/jiequancui/DKL.
Related papers
- Contrastive Learning with Orthonormal Anchors (CLOA) [0.0]
This study focuses on addressing the instability issues prevalent in contrastive learning, specifically examining the InfoNCE loss function and its derivatives.
We reveal a critical observation that these loss functions exhibit a restrictive behavior, leading to a convergence phenomenon where embeddings tend to merge into a singular point.
This "over-fusion" effect detrimentally affects classification accuracy in subsequent supervised-learning tasks.
arXiv Detail & Related papers (2024-03-27T15:48:16Z) - Mitigating Privacy Risk in Membership Inference by Convex-Concave Loss [16.399746814823025]
Machine learning models are susceptible to membership inference attacks (MIAs), which aim to infer whether a sample is in the training set.
Existing work utilizes gradient ascent to enlarge the loss variance of training data, alleviating the privacy risk.
We propose a novel method -- Convex-Concave Loss, which enables a high variance of training loss distribution by gradient descent.
arXiv Detail & Related papers (2024-02-08T07:14:17Z) - Class Incremental Learning for Adversarial Robustness [17.06592851567578]
Adrial training integrates adversarial examples during model training to enhance robustness.
We observe that combining incremental learning with naive adversarial training easily leads to a loss of robustness.
We propose the Flatness Preserving Distillation (FPD) loss that leverages the output difference between adversarial and clean examples.
arXiv Detail & Related papers (2023-12-06T04:38:02Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - Adaptive Cross Batch Normalization for Metric Learning [75.91093210956116]
Metric learning is a fundamental problem in computer vision.
We show that it is equally important to ensure that the accumulated embeddings are up to date.
In particular, it is necessary to circumvent the representational drift between the accumulated embeddings and the feature embeddings at the current training iteration.
arXiv Detail & Related papers (2023-03-30T03:22:52Z) - The KFIoU Loss for Rotated Object Detection [115.334070064346]
In this paper, we argue that one effective alternative is to devise an approximate loss who can achieve trend-level alignment with SkewIoU loss.
Specifically, we model the objects as Gaussian distribution and adopt Kalman filter to inherently mimic the mechanism of SkewIoU.
The resulting new loss called KFIoU is easier to implement and works better compared with exact SkewIoU.
arXiv Detail & Related papers (2022-01-29T10:54:57Z) - EvDistill: Asynchronous Events to End-task Learning via Bidirectional
Reconstruction-guided Cross-modal Knowledge Distillation [61.33010904301476]
Event cameras sense per-pixel intensity changes and produce asynchronous event streams with high dynamic range and less motion blur.
We propose a novel approach, called bfEvDistill, to learn a student network on the unlabeled and unpaired event data.
We show that EvDistill achieves significantly better results than the prior works and KD with only events and APS frames.
arXiv Detail & Related papers (2021-11-24T08:48:16Z) - Comparing Kullback-Leibler Divergence and Mean Squared Error Loss in
Knowledge Distillation [9.157410884444312]
Knowledge distillation (KD) has been investigated to design efficient neural architectures.
We show that the KL divergence loss focuses on the logit matching when tau increases and the label matching when tau goes to 0.
We show that sequential distillation can improve performance and that KD, particularly when using the KL divergence loss with small tau, mitigates the label noise.
arXiv Detail & Related papers (2021-05-19T04:40:53Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z) - Balancing reconstruction error and Kullback-Leibler divergence in
Variational Autoencoders [0.0]
We show that learning can be replaced by a simple deterministic computation, helping to understand the underlying mechanism.
On typical datasets such as Cifar and Celeba, our technique sensibly outperforms all previous VAE architectures.
arXiv Detail & Related papers (2020-02-18T12:22:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.