AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
- URL: http://arxiv.org/abs/2504.03587v1
- Date: Fri, 04 Apr 2025 16:56:17 GMT
- Title: AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
- Authors: Niu Lian, Jun Li, Jinpeng Wang, Ruisheng Luo, Yaowei Wang, Shu-Tao Xia, Bin Chen,
- Abstract summary: Self-Supervised Video Hashing (SSVH) compresses videos into hash codes for efficient indexing and retrieval using unlabeled training videos.<n>Existing approaches rely on random frame sampling to learn video features and treat all frames equally.<n>We propose a new framework, termed AutoSSVH, that employs adversarial frame sampling with hash-based contrastive learning.
- Score: 72.10024026634976
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-Supervised Video Hashing (SSVH) compresses videos into hash codes for efficient indexing and retrieval using unlabeled training videos. Existing approaches rely on random frame sampling to learn video features and treat all frames equally. This results in suboptimal hash codes, as it ignores frame-specific information density and reconstruction difficulty. To address this limitation, we propose a new framework, termed AutoSSVH, that employs adversarial frame sampling with hash-based contrastive learning. Our adversarial sampling strategy automatically identifies and selects challenging frames with richer information for reconstruction, enhancing encoding capability. Additionally, we introduce a hash component voting strategy and a point-to-set (P2Set) hash-based contrastive objective, which help capture complex inter-video semantic relationships in the Hamming space and improve the discriminability of learned hash codes. Extensive experiments demonstrate that AutoSSVH achieves superior retrieval efficacy and efficiency compared to state-of-the-art approaches. Code is available at https://github.com/EliSpectre/CVPR25-AutoSSVH.
Related papers
- VaQuitA: Enhancing Alignment in LLM-Assisted Video Understanding [63.075626670943116]
We introduce a cutting-edge framework, VaQuitA, designed to refine the synergy between video and textual information.
At the data level, instead of sampling frames uniformly, we implement a sampling method guided by CLIP-score rankings.
At the feature level, we integrate a trainable Video Perceiver alongside a Visual-Query Transformer.
arXiv Detail & Related papers (2023-12-04T19:48:02Z) - CHAIN: Exploring Global-Local Spatio-Temporal Information for Improved
Self-Supervised Video Hashing [45.216750448864275]
Learn accurate hash for video retrieval can be challenging due to high local redundancy and complex global video frames.
Our proposed Contrastive Hash-temporal Information (CHAIN) outperforms state-of-the-art self-supervised video hashing methods on four video benchmark datasets.
arXiv Detail & Related papers (2023-10-29T07:36:11Z) - Dual-Stream Knowledge-Preserving Hashing for Unsupervised Video
Retrieval [67.52910255064762]
We design a simple dual-stream structure, including a temporal layer and a hash layer.
We first design a simple dual-stream structure, including a temporal layer and a hash layer.
With the help of semantic similarity knowledge obtained from self-supervision, the hash layer learns to capture information for semantic retrieval.
In this way, the model naturally preserves the disentangled semantics into binary codes.
arXiv Detail & Related papers (2023-10-12T03:21:12Z) - Contrastive Masked Autoencoders for Self-Supervised Video Hashing [54.636976693527636]
Self-Supervised Video Hashing (SSVH) models learn to generate short binary representations for videos without ground-truth supervision.
We propose a simple yet effective one-stage SSVH method called ConMH, which incorporates video semantic information and video similarity relationship understanding.
arXiv Detail & Related papers (2022-11-21T06:48:14Z) - Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval [55.088635195893325]
We propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ)
HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide complementary understandings for texts and videos.
Experiments on three Web video benchmark datasets demonstrate that HCQ achieves competitive performance with state-of-the-art non-compressed retrieval methods.
arXiv Detail & Related papers (2022-02-07T18:04:10Z) - Self-Distilled Hashing for Deep Image Retrieval [25.645550298697938]
In hash-based image retrieval systems, transformed input from the original usually generates different codes.
We propose a novel self-distilled hashing scheme to minimize the discrepancy while exploiting the potential of augmented data.
We also introduce hash proxy-based similarity learning and binary cross entropy-based quantization loss to provide fine quality hash codes.
arXiv Detail & Related papers (2021-12-16T12:01:50Z) - DVHN: A Deep Hashing Framework for Large-scale Vehicle Re-identification [5.407157027628579]
We propose a deep hash-based vehicle re-identification framework, dubbed DVHN, which substantially reduces memory usage and promotes retrieval efficiency.
DVHN directly learns discrete compact binary hash codes for each image by jointly optimizing the feature learning network and the hash code generating module.
textbfDVHN of $2048$ bits can achieve 13.94% and 10.21% accuracy improvement in terms of textbfmAP and textbfRank@1 for textbfVehicleID (800) dataset.
arXiv Detail & Related papers (2021-12-09T14:11:27Z) - Auto-Encoding Twin-Bottleneck Hashing [141.5378966676885]
This paper proposes an efficient and adaptive code-driven graph.
It is updated by decoding in the context of an auto-encoder.
Experiments on benchmarked datasets clearly show the superiority of our framework over the state-of-the-art hashing methods.
arXiv Detail & Related papers (2020-02-27T05:58:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.