LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth
Limited Optical Signal Acquisition
- URL: http://arxiv.org/abs/2403.01412v1
- Date: Sun, 3 Mar 2024 06:49:01 GMT
- Title: LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth
Limited Optical Signal Acquisition
- Authors: Lingfeng Liu, Dong Ni, Hangjie Yuan
- Abstract summary: We introduce a novel approach leveraging pre-acquisition modulation to reduce the acquisition volume.
Uniquely, LUM-ViT incorporates a learnable under-sampling mask tailored for pre-acquisition modulation.
Our evaluations reveal that, by sampling a mere 10% of the original image pixels, LUM-ViT maintains the accuracy loss within 1.8% on the ImageNet classification task.
- Score: 14.773452863027037
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bandwidth constraints during signal acquisition frequently impede real-time
detection applications. Hyperspectral data is a notable example, whose vast
volume compromises real-time hyperspectral detection. To tackle this hurdle, we
introduce a novel approach leveraging pre-acquisition modulation to reduce the
acquisition volume. This modulation process is governed by a deep learning
model, utilizing prior information. Central to our approach is LUM-ViT, a
Vision Transformer variant. Uniquely, LUM-ViT incorporates a learnable
under-sampling mask tailored for pre-acquisition modulation. To further
optimize for optical calculations, we propose a kernel-level weight
binarization technique and a three-stage fine-tuning strategy. Our evaluations
reveal that, by sampling a mere 10% of the original image pixels, LUM-ViT
maintains the accuracy loss within 1.8% on the ImageNet classification task.
The method sustains near-original accuracy when implemented on real-world
optical hardware, demonstrating its practicality. Code will be available at
https://github.com/MaxLLF/LUM-ViT.
Related papers
- Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution.
We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain.
Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z) - LF-ViT: Reducing Spatial Redundancy in Vision Transformer for Efficient
Image Recognition [9.727093171296678]
Vision Transformer (ViT) excels in accuracy when handling high-resolution images.
It confronts the challenge of significant spatial redundancy, leading to increased computational and memory requirements.
We present the Localization and Focus Vision Transformer (LF-ViT)
It operates by strategically curtailing computational demands without impinging on performance.
arXiv Detail & Related papers (2024-01-08T01:32:49Z) - Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object
Structure via HyperNetworks [53.67497327319569]
We introduce a novel neural rendering technique to solve image-to-3D from a single view.
Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks.
Our experiments show the advantages of our proposed approach with consistent results and rapid generation.
arXiv Detail & Related papers (2023-12-24T08:42:37Z) - PRISTA-Net: Deep Iterative Shrinkage Thresholding Network for Coded
Diffraction Patterns Phase Retrieval [6.982256124089]
Phase retrieval is a challenge nonlinear inverse problem in computational imaging and image processing.
We have developed PRISTA-Net, a deep unfolding network based on the first-order iterative threshold threshold algorithm (ISTA)
All parameters in the proposed PRISTA-Net framework, including the nonlinear transformation, threshold, and step size, are learned-to-end instead of being set.
arXiv Detail & Related papers (2023-09-08T07:37:15Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - Read Pointer Meters in complex environments based on a Human-like
Alignment and Recognition Algorithm [16.823681016882315]
We propose a human-like alignment and recognition algorithm to overcome these problems.
A Spatial Transformed Module(STM) is proposed to obtain the front view of images in a self-autonomous way.
A Value Acquisition Module(VAM) is proposed to infer accurate meter values by an end-to-end trained framework.
arXiv Detail & Related papers (2023-02-28T05:37:04Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Mask-guided Vision Transformer (MG-ViT) for Few-Shot Learning [10.29251906347605]
We propose a novel mask-guided vision transformer (MG-ViT) to achieve an effective and efficient few-shot learning on vision transformer (ViT) model.
The MG-ViT model significantly improves the performance when compared with general fine-tuning based ViT models.
arXiv Detail & Related papers (2022-05-20T07:25:33Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - AdaViT: Adaptive Tokens for Efficient Vision Transformer [91.88404546243113]
We introduce AdaViT, a method that adaptively adjusts the inference cost of vision transformer (ViT) for images of different complexity.
AdaViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
arXiv Detail & Related papers (2021-12-14T18:56:07Z) - Deep Learning Adapted Acceleration for Limited-view Photoacoustic
Computed Tomography [1.8830359888767887]
Photoacoustic computed tomography (PACT) uses unfocused large-area light to illuminate the target with ultrasound transducer array for PA signal detection.
Limited-view issue could cause a low-quality image in PACT due to the limitation of geometric condition.
A model-based method that combines the mathematical variational model with deep learning is proposed to speed up and regularize the unrolled procedure of reconstruction.
arXiv Detail & Related papers (2021-11-08T02:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.