Related papers: Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression

Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression

URL: http://arxiv.org/abs/2404.10234v1
Date: Tue, 16 Apr 2024 02:29:00 GMT
Title: Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression
Authors: Jixiang Luo,
Abstract summary: Conventional approaches struggle to cope with the escalating complexity and scale of multimedia data. We proposed framework addresses this challenge by fusing AI-native multi-modal search capabilities with neural image compression. Our work marks a significant advancement towards scalable and efficient multi-modal search systems in the era of big data.
Score: 0.6345523830122168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The burgeoning volume of digital content across diverse modalities necessitates efficient storage and retrieval methods. Conventional approaches struggle to cope with the escalating complexity and scale of multimedia data. In this paper, we proposed framework addresses this challenge by fusing AI-native multi-modal search capabilities with neural image compression. First we analyze the intricate relationship between compressibility and searchability, recognizing the pivotal role each plays in the efficiency of storage and retrieval systems. Through the usage of simple adapter is to bridge the feature of Learned Image Compression(LIC) and Contrastive Language-Image Pretraining(CLIP) while retaining semantic fidelity and retrieval of multi-modal data. Experimental evaluations on Kodak datasets demonstrate the efficacy of our approach, showcasing significant enhancements in compression efficiency and search accuracy compared to existing methodologies. Our work marks a significant advancement towards scalable and efficient multi-modal search systems in the era of big data.

Related papers

MambaIC: State Space Models for High-Performance Learned Image Compression [53.991726013454695]
A high-performance image compression algorithm is crucial for real-time information transmission across numerous fields. Inspired by the effectiveness of state space models (SSMs) in capturing long-range dependencies, we leverage SSMs to address computational inefficiency in existing methods. We propose an enhanced image compression approach through refined context modeling, which we term MambaIC.
arXiv Detail & Related papers (2025-03-16T11:32:34Z)
MatchAnything: Universal Cross-Modality Image Matching with Large-Scale Pre-Training [62.843316348659165]
Deep learning-based image matching algorithms have dramatically outperformed humans in rapidly and accurately finding large amounts of correspondences. We propose a large-scale pre-training framework that utilizes synthetic cross-modal training signals to train models to recognize and match fundamental structures across images. Our key finding is that the matching model trained with our framework achieves remarkable generalizability across more than eight unseen cross-modality registration tasks.
arXiv Detail & Related papers (2025-01-13T18:37:36Z)
Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [49.81353382211113]
We address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently. We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features. We present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z)
PICS: Pipeline for Image Captioning and Search [0.0]
This paper introduces PICS (Pipeline for Image Captioning and Search), a novel approach designed to address the complexities inherent in organizing large-scale image repositories. The approach is rooted in the understanding that meaningful, AI-generated captions can significantly enhance the searchability and accessibility of images in large databases. The significance of PICS lies in its potential to transform image database systems, harnessing the power of machine learning and natural language processing to meet the demands of modern digital asset management.
arXiv Detail & Related papers (2024-02-01T03:08:21Z)
Efficient Neural Representation of Volumetric Data using Coordinate-Based Networks [0.0]
We propose an efficient approach for the compression and representation of volumetric data using coordinate-based networks and hash encoding. Our approach enables effective compression by learning a mapping between spatial coordinates and intensity values.
arXiv Detail & Related papers (2024-01-16T21:33:01Z)
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features [12.14013374452918]
We present a simple yet effective approach to object-centric open-vocabulary image retrieval. Our approach aggregates dense embeddings extracted from CLIP into a compact representation. We show the effectiveness of our scheme to the task by achieving significantly better results than global feature approaches on three datasets.
arXiv Detail & Related papers (2023-09-26T15:13:09Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Two Approaches to Supervised Image Segmentation [55.616364225463066]
The present work develops comparison experiments between deep learning and multiset neurons approaches. The deep learning approach confirmed its potential for performing image segmentation. The alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
arXiv Detail & Related papers (2023-07-19T16:42:52Z)
Machine Perception-Driven Image Compression: A Layered Generative Approach [32.23554195427311]
layered generative image compression model is proposed to achieve high human vision-oriented image reconstructed quality. Task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks. Joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance.
arXiv Detail & Related papers (2023-04-14T02:12:38Z)
Learning Multimodal Data Augmentation in Feature Space [65.54623807628536]
LeMDA is an easy-to-use method that automatically learns to jointly augment multimodal data in feature space. We show that LeMDA can profoundly improve the performance of multimodal deep learning architectures.
arXiv Detail & Related papers (2022-12-29T20:39:36Z)
Cross-Modality Sub-Image Retrieval using Contrastive Multimodal Image Representations [3.3754780158324564]
Cross-modality image retrieval is challenging, since images of similar (or even the same) content captured by different modalities might share few common structures. We propose a new application-independent content-based image retrieval system for reverse (sub-)image search across modalities.
arXiv Detail & Related papers (2022-01-10T19:04:28Z)
Video Coding for Machine: Compact Visual Representation Compression for Intelligent Collaborative Analytics [101.35754364753409]
Video Coding for Machines (VCM) is committed to bridging to an extent separate research tracks of video/image compression and feature compression. This paper summarizes VCM methodology and philosophy based on existing academia and industrial efforts.
arXiv Detail & Related papers (2021-10-18T12:42:13Z)
Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task. We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network. Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.