Towards Efficient Deep Hashing Retrieval: Condensing Your Data via
Feature-Embedding Matching
- URL: http://arxiv.org/abs/2305.18076v1
- Date: Mon, 29 May 2023 13:23:55 GMT
- Title: Towards Efficient Deep Hashing Retrieval: Condensing Your Data via
Feature-Embedding Matching
- Authors: Tao Feng, Jie Zhang, Peizheng Wang, Zhijie Wang
- Abstract summary: The expenses involved in training state-of-the-art deep hashing retrieval models have witnessed an increase.
The state-of-the-art dataset distillation methods can not expand to all deep hashing retrieval methods.
We propose an efficient condensation framework that addresses these limitations by matching the feature-embedding between synthetic set and real set.
- Score: 7.908244841289913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The expenses involved in training state-of-the-art deep hashing retrieval
models have witnessed an increase due to the adoption of more sophisticated
models and large-scale datasets. Dataset Distillation (DD) or Dataset
Condensation(DC) focuses on generating smaller synthetic dataset that retains
the original information. Nevertheless, existing DD methods face challenges in
maintaining a trade-off between accuracy and efficiency. And the
state-of-the-art dataset distillation methods can not expand to all deep
hashing retrieval methods. In this paper, we propose an efficient condensation
framework that addresses these limitations by matching the feature-embedding
between synthetic set and real set. Furthermore, we enhance the diversity of
features by incorporating the strategies of early-stage augmented models and
multi-formation. Extensive experiments provide compelling evidence of the
remarkable superiority of our approach, both in terms of performance and
efficiency, compared to state-of-the-art baseline methods.
Related papers
- Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Dataset Distillation via the Wasserstein Metric [35.32856617593164]
We introduce the Wasserstein distance, a metric grounded in optimal transport theory, to enhance distribution matching in dataset distillation.
Our method achieves new state-of-the-art performance across a range of high-resolution datasets.
arXiv Detail & Related papers (2023-11-30T13:15:28Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - Accelerating Dataset Distillation via Model Augmentation [41.3027484667024]
We propose two model augmentation techniques, i.e. using early-stage models and parameter parameters to learn an informative synthetic set with significantly reduced training cost.
Our method achieves up to 20x speedup and comparable performance on par with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-12T07:36:05Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.