Online Enhanced Semantic Hashing: Towards Effective and Efficient
Retrieval for Streaming Multi-Modal Data
- URL: http://arxiv.org/abs/2109.04260v1
- Date: Thu, 9 Sep 2021 13:30:31 GMT
- Title: Online Enhanced Semantic Hashing: Towards Effective and Efficient
Retrieval for Streaming Multi-Modal Data
- Authors: Xiao-Ming Wu, Xin Luo, Yu-Wei Zhan, Chen-Lu Ding, Zhen-Duo Chen,
Xin-Shun Xu
- Abstract summary: We propose a new model, termed Online enhAnced SemantIc haShing (OASIS)
We design novel semantic-enhanced representation for data, which could help handle the new coming classes.
Our method can exceed the state-of-the-art models.
- Score: 21.157717777481572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the vigorous development of multimedia equipment and applications,
efficient retrieval of large-scale multi-modal data has become a trendy
research topic. Thereinto, hashing has become a prevalent choice due to its
retrieval efficiency and low storage cost. Although multi-modal hashing has
drawn lots of attention in recent years, there still remain some problems. The
first point is that existing methods are mainly designed in batch mode and not
able to efficiently handle streaming multi-modal data. The second point is that
all existing online multi-modal hashing methods fail to effectively handle
unseen new classes which come continuously with streaming data chunks. In this
paper, we propose a new model, termed Online enhAnced SemantIc haShing (OASIS).
We design novel semantic-enhanced representation for data, which could help
handle the new coming classes, and thereby construct the enhanced semantic
objective function. An efficient and effective discrete online optimization
algorithm is further proposed for OASIS. Extensive experiments show that our
method can exceed the state-of-the-art models. For good reproducibility and
benefiting the community, our code and data are already available in
supplementary material and will be made publicly available.
Related papers
- LLMs Can Evolve Continually on Modality for X-Modal Reasoning [62.2874638875554]
Existing methods rely heavily on modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities.
We propose PathWeave, a flexible and scalable framework with modal-Path sWitching and ExpAnsion abilities.
PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73%.
arXiv Detail & Related papers (2024-10-26T13:19:57Z) - CLIP Multi-modal Hashing for Multimedia Retrieval [7.2683522480676395]
We propose a novel CLIP Multi-modal Hashing ( CLIPMH) method.
Our method employs the CLIP framework to extract both text and vision features and then fuses them to generate hash code.
Compared with state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly improve performance.
arXiv Detail & Related papers (2024-10-10T10:13:48Z) - CLIP Multi-modal Hashing: A new baseline CLIPMH [4.057431980018267]
We propose a new baseline CLIP Multi-modal Hashing ( CLIPMH) method.
It uses CLIP model to extract text and image features, and then fuse to generate hash code.
In comparison to state-of-the-art unsupervised and supervised multi-modal hashing methods, experiments reveal that the proposed CLIPMH can significantly enhance performance.
arXiv Detail & Related papers (2023-08-22T21:29:55Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Asymmetric Scalable Cross-modal Hashing [51.309905690367835]
Cross-modal hashing is a successful method to solve large-scale multimedia retrieval issue.
We propose a novel Asymmetric Scalable Cross-Modal Hashing (ASCMH) to address these issues.
Our ASCMH outperforms the state-of-the-art cross-modal hashing methods in terms of accuracy and efficiency.
arXiv Detail & Related papers (2022-07-26T04:38:47Z) - Dynamic Multimodal Fusion [8.530680502975095]
Dynamic multimodal fusion (DynMM) is a new approach that adaptively fuses multimodal data and generates data-dependent forward paths during inference.
Results on various multimodal tasks demonstrate the efficiency and wide applicability of our approach.
arXiv Detail & Related papers (2022-03-31T21:35:13Z) - Fast Class-wise Updating for Online Hashing [196.14748396106955]
This paper presents a novel supervised online hashing scheme, termed Fast Class-wise Updating for Online Hashing (FCOH)
A class-wise updating method is developed to decompose the binary code learning and alternatively renew the hash functions in a class-wise fashion, which well addresses the burden on large amounts of training batches.
To further achieve online efficiency, we propose a semi-relaxation optimization, which accelerates the online training by treating different binary constraints independently.
arXiv Detail & Related papers (2020-12-01T07:41:54Z) - Creating Something from Nothing: Unsupervised Knowledge Distillation for
Cross-Modal Hashing [132.22315429623575]
Cross-modal hashing (CMH) can map contents from different modalities, especially in vision and language, into the same space.
There are two main frameworks for CMH, differing from each other in whether semantic supervision is required.
In this paper, we propose a novel approach that enables guiding a supervised method using outputs produced by an unsupervised method.
arXiv Detail & Related papers (2020-04-01T08:32:15Z) - Deep Multi-View Enhancement Hashing for Image Retrieval [40.974719473643724]
This paper proposes a supervised multi-view hash model which can enhance the multi-view information through neural networks.
The proposed method is systematically evaluated on the CIFAR-10, NUS-WIDE and MS-COCO datasets.
arXiv Detail & Related papers (2020-02-01T08:32:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.