Related papers: Vision without Images: End-to-End Computer Vision from Single Compressive Measurements

Vision without Images: End-to-End Computer Vision from Single Compressive Measurements

URL: http://arxiv.org/abs/2501.15122v2
Date: Tue, 05 Aug 2025 18:56:39 GMT
Title: Vision without Images: End-to-End Computer Vision from Single Compressive Measurements
Authors: Fengpu Pan, Heting Gao, Jiangtao Wen, Yuxing Han,
Abstract summary: Snapshot Compressed Imaging (SCI) offers high-speed, low-bandwidth, and energy-efficient image acquisition.<n> practical hardware constraints in high-resolution sensors limit the use of large frame-sized masks.<n>We present a novel SCI-based computer vision framework using pseudo-random binary masks of only 8$times$8 in size for physically feasible implementations.
Score: 13.328018344037808
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Snapshot Compressed Imaging (SCI) offers high-speed, low-bandwidth, and energy-efficient image acquisition, but remains challenged by low-light and low signal-to-noise ratio (SNR) conditions. Moreover, practical hardware constraints in high-resolution sensors limit the use of large frame-sized masks, necessitating smaller, hardware-friendly designs. In this work, we present a novel SCI-based computer vision framework using pseudo-random binary masks of only 8$\times$8 in size for physically feasible implementations. At its core is CompDAE, a Compressive Denoising Autoencoder built on the STFormer architecture, designed to perform downstream tasks--such as edge detection and depth estimation--directly from noisy compressive raw pixel measurements without image reconstruction. CompDAE incorporates a rate-constrained training strategy inspired by BackSlash to promote compact, compressible models. A shared encoder paired with lightweight task-specific decoders enables a unified multi-task platform. Extensive experiments across multiple datasets demonstrate that CompDAE achieves state-of-the-art performance with significantly lower complexity, especially under ultra-low-light conditions where traditional CMOS and SCI pipelines fail.

Related papers

LoC-LIC: Low Complexity Learned Image Coding Using Hierarchical Feature Transforms [16.428925911432344]
We propose an innovative approach that employs hierarchical feature extraction transforms to significantly reduce complexity.<n>Our novel architecture achieves this by using fewer channels for high spatial resolution inputs/feature maps.<n>As a result, the reduced complexity model can open the way for learned image compression models to operate efficiently across various devices.
arXiv Detail & Related papers (2025-04-30T16:30:06Z)
FD-LSCIC: Frequency Decomposition-based Learned Screen Content Image Compression [67.34466255300339]
This paper addresses three key challenges in SC image compression: learning compact latent features, adapting quantization step sizes, and the lack of large SC datasets. We introduce an adaptive quantization module that learns scaled uniform noise for each frequency component, enabling flexible control over quantization granularity. We construct a large SC image compression dataset (SDU-SCICD10K), which includes over 10,000 images spanning basic SC images, computer-rendered images, and mixed NS and SC images from both PC and mobile platforms.
arXiv Detail & Related papers (2025-02-21T03:15:16Z)
Enhanced Confocal Laser Scanning Microscopy with Adaptive Physics Informed Deep Autoencoders [0.0]
We present a physics-informed deep learning framework to address limitations in Confocal Laser Scanning Microscopy.<n>The model reconstructs high fidelity images from heavily noisy inputs by using convolutional and transposed convolutional layers.
arXiv Detail & Related papers (2025-01-24T18:32:34Z)
Rethinking High-speed Image Reconstruction Framework with Spike Camera [48.627095354244204]
Spike cameras generate continuous spike streams to capture high-speed scenes with lower bandwidth and higher dynamic range than traditional RGB cameras.<n>We introduce a novel spike-to-image reconstruction framework SpikeCLIP that goes beyond traditional training paradigms.<n>Our experiments on real-world low-light datasets demonstrate that SpikeCLIP significantly enhances texture details and the luminance balance of recovered images.
arXiv Detail & Related papers (2025-01-08T13:00:17Z)
Ultra-Low Complexity On-Orbit Compression for Remote Sensing Imagery via Block Modulated Imaging [17.334800411037836]
This paper advances the study of compressed sensing in remote sensing image compression.<n>By requiring only a single exposure, Block Modulated Imaging (BMI) significantly enhances imaging acquisition speeds.<n>We propose a novel decoding network specifically designed to reconstruct images compressed under the BMI framework.
arXiv Detail & Related papers (2024-12-24T13:18:00Z)
A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging [15.351152482692383]
Video Snapshot Compressive Imaging (SCI) aims to use a low-speed 2D camera to capture high-speed scene as snapshot compressed measurements.<n>Deep learning-based algorithms have achieved impressive performance, yet with heavy computational workload.<n>We propose a low-bit quantization framework (dubbed Q-SCI) for the end-to-end deep learning-based video SCI reconstruction methods.
arXiv Detail & Related papers (2024-07-31T10:38:11Z)
Real-Time Compressed Sensing for Joint Hyperspectral Image Transmission and Restoration for CubeSat [9.981107535103687]
We propose a Real-Time Compressed Sensing network designed to be lightweight and require only relatively few training samples. The RTCS network features a simplified architecture that reduces the required training samples and allows for easy implementation on integer-8-based encoders. Our encoder employs an integer-8-compatible linear projection for stripe-like HSI data transmission, ensuring real-time compressed sensing.
arXiv Detail & Related papers (2024-04-24T10:03:37Z)
MISC: Ultra-low Bitrate Image Semantic Compression Driven by Large Multimodal Model [78.4051835615796]
This paper proposes a method called Multimodal Image Semantic Compression. It consists of an LMM encoder for extracting the semantic information of the image, a map encoder to locate the region corresponding to the semantic, an image encoder generates an extremely compressed bitstream, and a decoder reconstructs the image based on the above information. It can achieve optimal consistency and perception results while saving perceptual 50%, which has strong potential applications in the next generation of storage and communication.
arXiv Detail & Related papers (2024-02-26T17:11:11Z)
Hybrid Training of Denoising Networks to Improve the Texture Acutance of Digital Cameras [3.400056739248712]
We propose a mixed training procedure for image restoration neural networks, relying on both natural and synthetic images, that yields a strong improvement of this acutance metric without impairing fidelity terms. The feasibility of the approach is demonstrated both on the denoising of RGB images and the full development of RAW images, opening the path to a systematic improvement of the texture acutance of real imaging devices.
arXiv Detail & Related papers (2024-02-20T10:47:06Z)
Transferable Learned Image Compression-Resistant Adversarial Perturbations [66.46470251521947]
Adversarial attacks can readily disrupt the image classification system, revealing the vulnerability of DNN-based recognition tasks. We introduce a new pipeline that targets image classification models that utilize learned image compressors as pre-processing modules.
arXiv Detail & Related papers (2024-01-06T03:03:28Z)
LWGNet: Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval [14.588976801396576]
We propose a hybrid model-driven residual network that combines the knowledge of the forward imaging system with a deep data-driven network. Unlike other conventional unrolling techniques, LWGNet uses fewer stages while performing at par or even better than existing traditional and deep learning techniques. This improvement in performance for low-bit depth and low-cost sensors has the potential to bring down the cost of FPM imaging setup significantly.
arXiv Detail & Related papers (2022-08-08T17:22:54Z)
Lightweight HDR Camera ISP for Robust Perception in Dynamic Illumination Conditions via Fourier Adversarial Networks [35.532434169432776]
We propose a lightweight two-stage image enhancement algorithm sequentially balancing illumination and noise removal. We also propose a Fourier spectrum-based adversarial framework (AFNet) for consistent image enhancement under varying illumination conditions. Based on quantitative and qualitative evaluations, we also examine the practicality and effects of image enhancement techniques on the performance of common perception tasks.
arXiv Detail & Related papers (2022-04-04T18:48:51Z)
Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks. We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation. We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z)
Neural JPEG: End-to-End Image Compression Leveraging a Standard JPEG Encoder-Decoder [73.48927855855219]
We propose a system that learns to improve the encoding performance by enhancing its internal neural representations on both the encoder and decoder ends. Experiments demonstrate that our approach successfully improves the rate-distortion performance over JPEG across various quality metrics.
arXiv Detail & Related papers (2022-01-27T20:20:03Z)
Burst Imaging for Light-Constrained Structure-From-Motion [4.125187280299246]
We develop an image processing technique for aiding 3D reconstruction from images acquired in low light conditions. Our technique, based on burst photography, uses direct methods for image registration within bursts of short exposure time images. Our method is a significant step towards allowing robots to operate in low light conditions, with potential applications to robots operating in environments such as underground mines and night time operation.
arXiv Detail & Related papers (2021-08-23T02:12:40Z)
10-mega pixel snapshot compressive imaging with a hybrid coded aperture [48.95666098332693]
High resolution images are widely used in our daily life, whereas high-speed video capture is challenging due to the low frame rate of cameras working at the high resolution mode. snapshot imaging (SCI) was proposed as a solution to the low throughput of existing imaging systems.
arXiv Detail & Related papers (2021-06-30T01:09:24Z)
Time-Multiplexed Coded Aperture Imaging: Learned Coded Aperture and Pixel Exposures for Compressive Imaging Systems [56.154190098338965]
We show that our proposed time multiplexed coded aperture (TMCA) can be optimized end-to-end. TMCA induces better coded snapshots enabling superior reconstructions in two different applications: compressive light field imaging and hyperspectral imaging. This codification outperforms the state-of-the-art compressive imaging systems by more than 4dB in those applications.
arXiv Detail & Related papers (2021-04-06T22:42:34Z)
Modeling Lost Information in Lossy Image Compression [72.69327382643549]
Lossy image compression is one of the most commonly used operators for digital images. We propose a novel invertible framework called Invertible Lossy Compression (ILC) to largely mitigate the information loss problem.
arXiv Detail & Related papers (2020-06-22T04:04:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.