Related papers: Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning

Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning

URL: http://arxiv.org/abs/2211.07566v1
Date: Mon, 14 Nov 2022 17:38:07 GMT
Title: Self-distillation with Online Diffusion on Batch Manifolds Improves Deep Metric Learning
Authors: Zelong Zeng, Fan Yang, Hong Liu and Shin'ichi Satoh
Abstract summary: We propose Online Batch Diffusion-based Self-Distillation (OBD-SD) for DML. We first propose a simple but effective Progressive Self-Distillation (PSD), which distills the knowledge progressively from the model itself during training. Then, we extend PSD with an Online Batch Diffusion Process (OBDP), which is to capture the local geometric structure of manifold in each batch.
Score: 23.974500845619175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent deep metric learning (DML) methods typically leverage solely class labels to keep positive samples far away from negative ones. However, this type of method normally ignores the crucial knowledge hidden in the data (e.g., intra-class information variation), which is harmful to the generalization of the trained model. To alleviate this problem, in this paper we propose Online Batch Diffusion-based Self-Distillation (OBD-SD) for DML. Specifically, we first propose a simple but effective Progressive Self-Distillation (PSD), which distills the knowledge progressively from the model itself during training. The soft distance targets achieved by PSD can present richer relational information among samples, which is beneficial for the diversity of embedding representations. Then, we extend PSD with an Online Batch Diffusion Process (OBDP), which is to capture the local geometric structure of manifolds in each batch, so that it can reveal the intrinsic relationships among samples in the batch and produce better soft distance targets. Note that our OBDP is able to restore the insufficient manifold relationships obtained by the original PSD and achieve significant performance improvement. Our OBD-SD is a flexible framework that can be integrated into state-of-the-art (SOTA) DML methods. Extensive experiments on various benchmarks, namely CUB200, CARS196, and Stanford Online Products, demonstrate that our OBD-SD consistently improves the performance of the existing DML methods on multiple datasets with negligible additional training time, achieving very competitive results. Code: \url{https://github.com/ZelongZeng/OBD-SD_Pytorch}

Related papers

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations [2.992602379681373]
We show that multi-modal fine-tuning can achieve notable OoDD performance. We propose a training objective that enhances cross-modal alignment by regularizing the distances between image and text embeddings of ID data.
arXiv Detail & Related papers (2025-03-24T16:00:21Z)
Improved Diffusion-based Generative Model with Better Adversarial Robustness [65.38540020916432]
Diffusion Probabilistic Models (DPMs) have achieved significant success in generative tasks. During the denoising process, the input data distributions differ between the training and inference stages.
arXiv Detail & Related papers (2025-02-24T12:29:16Z)
A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning. Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation. We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z)
DANCE: Dual-View Distribution Alignment for Dataset Condensation [39.08022095906364]
We propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE) Specifically, from the inner-class view, we construct multiple "middle encoders" to perform pseudo long-term distribution alignment. While from the inter-class view, we use the expert models to perform distribution calibration.
arXiv Detail & Related papers (2024-06-03T07:22:17Z)
Bayesian Diffusion Models for 3D Shape Reconstruction [54.69889488052155]
We present a prediction algorithm that performs effective Bayesian inference by tightly coupling the top-down (prior) information with the bottom-up (data-driven) procedure. We show the effectiveness of BDM on the 3D shape reconstruction task.
arXiv Detail & Related papers (2024-03-11T17:55:53Z)
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features. This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks. We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
A Penalty Approach for Normalizing Feature Distributions to Build Confounder-Free Models [11.818509522227565]
MetaData Normalization (MDN) estimates the linear relationship between the metadata and each feature based on a non-trainable closed-form solution. We extend the MDN method by applying a Penalty approach (referred to as PDMN) We show improvement in model accuracy and greater independence from confounders using PMDN over MDN in a synthetic experiment and a multi-label, multi-site dataset of magnetic resonance images (MRIs)
arXiv Detail & Related papers (2022-07-11T04:02:12Z)
A Comparative Survey of Deep Active Learning [76.04825433362709]
Active Learning (AL) is a set of techniques for reducing labeling cost by sequentially selecting data samples from a large unlabeled data pool for labeling. Deep Learning (DL) is data-hungry, and the performance of DL models scales monotonically with more training data. In recent years, Deep Active Learning (DAL) has risen as feasible solutions for maximizing model performance while minimizing the expensive labeling cost.
arXiv Detail & Related papers (2022-03-25T05:17:24Z)
Bilevel Online Deep Learning in Non-stationary Environment [4.565872584112864]
Bilevel Online Deep Learning (BODL) framework combines bilevel optimization strategy and online ensemble classifier. When the concept drift is detected, our BODL algorithm can adaptively update the model parameters via bilevel optimization and then circumvent the large drift and encourage positive transfer.
arXiv Detail & Related papers (2022-01-25T11:05:51Z)
Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning. Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch. ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
Revisiting Training Strategies and Generalization Performance in Deep Metric Learning [28.54755295856929]
We revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices. Under consistent comparison, DML objectives show much higher saturation than indicated by literature. Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models.
arXiv Detail & Related papers (2020-02-19T22:16:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.