Related papers: Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation

URL: http://arxiv.org/abs/2407.09853v1
Date: Sat, 13 Jul 2024 11:22:41 GMT
Title: Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation
Authors: Han Li, Shaohui Li, Shuangrui Ding, Wenrui Dai, Maida Cao, Chenglin Li, Junni Zou, Hongkai Xiong,
Abstract summary: Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. We develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH.
Score: 61.22401987355781
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Image compression for machine and human vision (ICMH) has gained increasing attention in recent years. Existing ICMH methods are limited by high training and storage overheads due to heavy design of task-specific networks. To address this issue, in this paper, we develop a novel lightweight adapter-based tuning framework for ICMH, named Adapt-ICMH, that better balances task performance and bitrates with reduced overheads. We propose a spatial-frequency modulation adapter (SFMA) that simultaneously eliminates non-semantic redundancy with a spatial modulation adapter, and enhances task-relevant frequency components and suppresses task-irrelevant frequency components with a frequency modulation adapter. The proposed adapter is plug-and-play and compatible with almost all existing learned image compression models without compromising the performance of pre-trained models. Experiments demonstrate that Adapt-ICMH consistently outperforms existing ICMH frameworks on various machine vision tasks with fewer fine-tuned parameters and reduced computational complexity. Code will be released at https://github.com/qingshi9974/ECCV2024-AdpatICMH .

Related papers

Frequency Dynamic Convolution for Dense Image Prediction [34.915070244005854]
We introduce Frequency Dynamic Convolution (FDConv), a novel approach that mitigates limitations by learning a fixed parameter budget in the Fourier domain. FDConv divides this budget into frequency-based groups with disjoint Fourier indices, enabling the construction of frequency-diverse weights without increasing the parameter cost. We demonstrate that when applied to ResNet-50, FDConv achieves superior performance with a modest increase of +3.6M parameters.
arXiv Detail & Related papers (2025-03-24T15:32:06Z)
CMamba: Learned Image Compression with State Space Models [31.10785880342252]
We propose a hybrid Convolution and State Space Models (SSMs) based image compression framework to achieve superior rate-distortion performance. Specifically, CMamba introduces two key components: a Content-Adaptive SSM (CA-SSM) module and a Context-Aware Entropy (CAE) module. Experimental results demonstrate that CMamba achieves superior rate-distortion performance.
arXiv Detail & Related papers (2025-02-07T15:07:04Z)
FreqMixFormerV2: Lightweight Frequency-aware Mixed Transformer for Human Skeleton Action Recognition [9.963966059349731]
FreqMixForemrV2 is built upon the Frequency-aware Mixed Transformer (FreqMixFormer) for identifying subtle and discriminative actions. The proposed model achieves a superior balance between efficiency and accuracy, outperforming state-of-the-art methods with only 60% of the parameters.
arXiv Detail & Related papers (2024-12-29T23:52:40Z)
Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders. We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z)
CAD: Memory Efficient Convolutional Adapter for Segment Anything [3.760646312664378]
Foundation model for image segmentation, Segment Anything (SAM) has been actively researched in various fields. adapter-based fine-tuning approaches have reported parameter efficiency and significant performance improvements. This paper proposes a memory-efficient parallel convolutional adapter architecture.
arXiv Detail & Related papers (2024-09-24T09:02:23Z)
Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z)
Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter [10.001964627074704]
Urban waterlogging poses a major risk to public safety and infrastructure. Recent advances employ surveillance camera imagery and deep learning for detection, yet these struggle amidst scarce data and adverse environmental conditions. We establish a challenging Urban Waterlogging Benchmark (UW-Bench) under diverse adverse conditions to advance real-world applications.
arXiv Detail & Related papers (2024-07-11T01:03:02Z)
Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat [9.981107535103687]
We propose a Real-Time Compressed Sensing network designed to be lightweight and require only relatively few training samples. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing.
arXiv Detail & Related papers (2024-04-24T10:03:37Z)
Revisiting the Parameter Efficiency of Adapters from the Perspective of Precision Redundancy [17.203320079872952]
Current state-of-the-art results in computer vision depend in part on fine-tuning large pre-trained vision models. With the exponential growth of model sizes, the conventional full fine-tuning leads to increasingly huge storage and transmission overhead. In this paper, we investigate how to make adapters even more efficient, reaching a new minimum size required to store a task-specific fine-tuned network.
arXiv Detail & Related papers (2023-07-31T17:22:17Z)
Spatially-Adaptive Feature Modulation for Efficient Image Super-Resolution [90.16462805389943]
We develop a spatially-adaptive feature modulation (SAFM) mechanism upon a vision transformer (ViT)-like block. Proposed method is $3times$ smaller than state-of-the-art efficient SR methods.
arXiv Detail & Related papers (2023-02-27T14:19:31Z)
A Simple Adaptive Unfolding Network for Hyperspectral Image Reconstruction [33.53825801739728]
We present a simple, efficient, and scalable unfolding network, SAUNet, to simplify the network design. SAUNet can be scaled to non-trivial 13 stages with continuous improvement. We set new records on CAVE and KAIST HSI reconstruction benchmarks.
arXiv Detail & Related papers (2023-01-24T18:28:21Z)
AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters. This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation. We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z)
Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain. In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden. Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
AdapterDrop: On the Efficiency of Adapters in Transformers [53.845909603631945]
Massively pre-trained transformer models are computationally expensive to fine-tune, slow for inference, and have large storage requirements. Recent approaches tackle these shortcomings by training smaller models, dynamically reducing the model size, and by training light-weight adapters.
arXiv Detail & Related papers (2020-10-22T17:49:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.