Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching
- URL: http://arxiv.org/abs/2305.18076v2
- Date: Mon, 03 Mar 2025 09:26:18 GMT
- Title: Towards Efficient Deep Hashing Retrieval: Condensing Your Data via Feature-Embedding Matching
- Authors: Tao Feng, Jie Zhang, Huashan Liu, Zhijie Wang, Shengyuan Pang,
- Abstract summary: Training advanced deep hashing models has become more expensive due to complex optimizations and large datasets.<n>We propose IEM (Information-intensive feature Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space.
- Score: 5.2193774924981176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep hashing retrieval has gained widespread use in big data retrieval due to its robust feature extraction and efficient hashing process. However, training advanced deep hashing models has become more expensive due to complex optimizations and large datasets. Coreset selection and Dataset Condensation lower overall training costs by reducing the volume of training data without significantly compromising model accuracy for classification task. In this paper, we explore the effect of mainstream dataset condensation methods for deep hashing retrieval and propose IEM (Information-intensive feature Embedding Matching), which is centered on distribution matching and incorporates model and data augmentation techniques to further enhance the feature of hashing space. Extensive experiments demonstrate the superior performance and efficiency of our approach.
Related papers
- KALAHash: Knowledge-Anchored Low-Resource Adaptation for Deep Hashing [19.667480064079083]
Existing deep hashing methods rely on abundant training data, leaving the more challenging scenario of low-resource adaptation relatively underexplored.
We introduce Class-Calibration LoRA, a novel plug-and-play approach that dynamically constructs low-rank adaptation by leveraging class-level textual knowledge embeddings.
Our proposed method, Knowledge- Anchored Low-Resource Adaptation Hashing (KALAHash), significantly boosts retrieval performance and achieves a 4x data efficiency in low-resource scenarios.
arXiv Detail & Related papers (2024-12-27T03:04:54Z) - Deep learning-based shot-domain seismic deblending [1.6411821807321063]
We make use of unblended shot gathers acquired at the end of each sail line.
By manually blending these data we obtain training data with good control of the ground truth.
We train a deep neural network using multi-channel inputs that include adjacent blended shot gathers.
arXiv Detail & Related papers (2024-09-13T07:32:31Z) - RREH: Reconstruction Relations Embedded Hashing for Semi-Paired Cross-Modal Retrieval [32.06421737874828]
Reconstruction Relations Embedded Hashing (RREH) is designed for semi-paired cross-modal retrieval tasks.
RREH assumes that multi-modal data share a common subspace.
anchors are sampled from paired data, which improves the efficiency of hash learning.
arXiv Detail & Related papers (2024-05-28T03:12:54Z) - Distribution-Aware Data Expansion with Diffusion Models [55.979857976023695]
We propose DistDiff, a training-free data expansion framework based on the distribution-aware diffusion model.
DistDiff consistently enhances accuracy across a diverse range of datasets compared to models trained solely on original data.
arXiv Detail & Related papers (2024-03-11T14:07:53Z) - Importance-Aware Adaptive Dataset Distillation [53.79746115426363]
Development of deep learning models is enabled by the availability of large-scale datasets.
dataset distillation aims to synthesize a compact dataset that retains the essential information from the large original dataset.
We propose an importance-aware adaptive dataset distillation (IADD) method that can improve distillation performance.
arXiv Detail & Related papers (2024-01-29T03:29:39Z) - Dataset Distillation via the Wasserstein Metric [35.32856617593164]
We introduce the Wasserstein distance, a metric grounded in optimal transport theory, to enhance distribution matching in dataset distillation.
Our method achieves new state-of-the-art performance across a range of high-resolution datasets.
arXiv Detail & Related papers (2023-11-30T13:15:28Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - Accelerating Dataset Distillation via Model Augmentation [41.3027484667024]
We propose two model augmentation techniques, i.e. using early-stage models and parameter parameters to learn an informative synthetic set with significantly reduced training cost.
Our method achieves up to 20x speedup and comparable performance on par with state-of-the-art methods.
arXiv Detail & Related papers (2022-12-12T07:36:05Z) - Segmentation-guided Domain Adaptation for Efficient Depth Completion [3.441021278275805]
We propose an efficient depth completion model based on a vgg05-like CNN architecture and a semi-supervised domain adaptation approach.
In order to boost spatial coherence, we guide the learning process using segmentations as additional source of information.
Our approach improves on previous efficient and low parameter state of the art approaches while having a noticeably lower computational footprint.
arXiv Detail & Related papers (2022-10-14T13:01:25Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE)
At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales.
We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z) - DANCE: DAta-Network Co-optimization for Efficient Segmentation Model Training and Inference [86.03382625531951]
DANCE is an automated simultaneous data-network co-optimization for efficient segmentation model training and inference.
It integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images' spatial complexity.
Experiments and ablating studies demonstrate that DANCE can achieve "all-win" towards efficient segmentation.
arXiv Detail & Related papers (2021-07-16T04:58:58Z) - Making Online Sketching Hashing Even Faster [63.16042585506435]
We present a FasteR Online Sketching Hashing (FROSH) algorithm to sketch the data in a more compact form via an independent transformation.
We provide theoretical justification to guarantee that our proposed FROSH consumes less time and achieves a comparable sketching precision.
We also extend FROSH to its distributed implementation, namely DFROSH, to further reduce the training time cost of FROSH.
arXiv Detail & Related papers (2020-10-10T08:50:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.