Self-distillation with Online Diffusion on Batch Manifolds Improves Deep
Metric Learning
- URL: http://arxiv.org/abs/2211.07566v1
- Date: Mon, 14 Nov 2022 17:38:07 GMT
- Title: Self-distillation with Online Diffusion on Batch Manifolds Improves Deep
Metric Learning
- Authors: Zelong Zeng, Fan Yang, Hong Liu and Shin'ichi Satoh
- Abstract summary: We propose Online Batch Diffusion-based Self-Distillation (OBD-SD) for DML.
We first propose a simple but effective Progressive Self-Distillation (PSD), which distills the knowledge progressively from the model itself during training.
Then, we extend PSD with an Online Batch Diffusion Process (OBDP), which is to capture the local geometric structure of manifold in each batch.
- Score: 23.974500845619175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent deep metric learning (DML) methods typically leverage solely class
labels to keep positive samples far away from negative ones. However, this type
of method normally ignores the crucial knowledge hidden in the data (e.g.,
intra-class information variation), which is harmful to the generalization of
the trained model. To alleviate this problem, in this paper we propose Online
Batch Diffusion-based Self-Distillation (OBD-SD) for DML. Specifically, we
first propose a simple but effective Progressive Self-Distillation (PSD), which
distills the knowledge progressively from the model itself during training. The
soft distance targets achieved by PSD can present richer relational information
among samples, which is beneficial for the diversity of embedding
representations. Then, we extend PSD with an Online Batch Diffusion Process
(OBDP), which is to capture the local geometric structure of manifolds in each
batch, so that it can reveal the intrinsic relationships among samples in the
batch and produce better soft distance targets. Note that our OBDP is able to
restore the insufficient manifold relationships obtained by the original PSD
and achieve significant performance improvement. Our OBD-SD is a flexible
framework that can be integrated into state-of-the-art (SOTA) DML methods.
Extensive experiments on various benchmarks, namely CUB200, CARS196, and
Stanford Online Products, demonstrate that our OBD-SD consistently improves the
performance of the existing DML methods on multiple datasets with negligible
additional training time, achieving very competitive results. Code:
\url{https://github.com/ZelongZeng/OBD-SD_Pytorch}
Related papers
- A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning.
Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation.
We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z) - DANCE: Dual-View Distribution Alignment for Dataset Condensation [39.08022095906364]
We propose a new DM-based method named Dual-view distribution AligNment for dataset CondEnsation (DANCE)
Specifically, from the inner-class view, we construct multiple "middle encoders" to perform pseudo long-term distribution alignment.
While from the inter-class view, we use the expert models to perform distribution calibration.
arXiv Detail & Related papers (2024-06-03T07:22:17Z) - Bayesian Diffusion Models for 3D Shape Reconstruction [54.69889488052155]
We present a prediction algorithm that performs effective Bayesian inference by tightly coupling the top-down (prior) information with the bottom-up (data-driven) procedure.
We show the effectiveness of BDM on the 3D shape reconstruction task.
arXiv Detail & Related papers (2024-03-11T17:55:53Z) - Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? [60.50127555651554]
Large Language Models (LLMs) show impressive results in numerous practical applications, but they lack essential safety features.
This makes them vulnerable to manipulations such as indirect prompt injections and generally unsuitable for safety-critical tasks.
We introduce a formal measure for instruction-data separation and an empirical variant that is calculable from a model's outputs.
arXiv Detail & Related papers (2024-03-11T15:48:56Z) - From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets.
Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z) - BOOT: Data-free Distillation of Denoising Diffusion Models with
Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images.
Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few.
We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z) - A Penalty Approach for Normalizing Feature Distributions to Build
Confounder-Free Models [11.818509522227565]
MetaData Normalization (MDN) estimates the linear relationship between the metadata and each feature based on a non-trainable closed-form solution.
We extend the MDN method by applying a Penalty approach (referred to as PDMN)
We show improvement in model accuracy and greater independence from confounders using PMDN over MDN in a synthetic experiment and a multi-label, multi-site dataset of magnetic resonance images (MRIs)
arXiv Detail & Related papers (2022-07-11T04:02:12Z) - A Comparative Survey of Deep Active Learning [76.04825433362709]
Active Learning (AL) is a set of techniques for reducing labeling cost by sequentially selecting data samples from a large unlabeled data pool for labeling.
Deep Learning (DL) is data-hungry, and the performance of DL models scales monotonically with more training data.
In recent years, Deep Active Learning (DAL) has risen as feasible solutions for maximizing model performance while minimizing the expensive labeling cost.
arXiv Detail & Related papers (2022-03-25T05:17:24Z) - Bilevel Online Deep Learning in Non-stationary Environment [4.565872584112864]
Bilevel Online Deep Learning (BODL) framework combines bilevel optimization strategy and online ensemble classifier.
When the concept drift is detected, our BODL algorithm can adaptively update the model parameters via bilevel optimization and then circumvent the large drift and encourage positive transfer.
arXiv Detail & Related papers (2022-01-25T11:05:51Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z) - Revisiting Training Strategies and Generalization Performance in Deep
Metric Learning [28.54755295856929]
We revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices.
Under consistent comparison, DML objectives show much higher saturation than indicated by literature.
Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models.
arXiv Detail & Related papers (2020-02-19T22:16:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.