Latent Space Imaging
- URL: http://arxiv.org/abs/2407.07052v1
- Date: Tue, 9 Jul 2024 17:17:03 GMT
- Title: Latent Space Imaging
- Authors: Matheus Souza, Yidan Zheng, Kaizhang Kang, Yogeshwar Nath Mishra, Qiang Fu, Wolfgang Heidrich,
- Abstract summary: We propose to follow a similar approach for the development of artificial vision systems.
Latent Space Imaging is a new paradigm that, through a combination of optics and software, directly encodes the image information into the semantically rich latent space of a generative model.
We demonstrate this new principle through an initial hardware prototype based on the single pixel camera.
- Score: 15.435034286180295
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Digital imaging systems have classically been based on brute-force measuring and processing of pixels organized on regular grids. The human visual system, on the other hand, performs a massive data reduction from the number of photo-receptors to the optic nerve, essentially encoding the image information into a low bandwidth latent space representation suitable for processing by the human brain. In this work, we propose to follow a similar approach for the development of artificial vision systems. Latent Space Imaging is a new paradigm that, through a combination of optics and software, directly encodes the image information into the semantically rich latent space of a generative model, thus substantially reducing bandwidth and memory requirements during the capture process. We demonstrate this new principle through an initial hardware prototype based on the single pixel camera. By designing an amplitude modulation scheme that encodes into the latent space of a generative model, we achieve compression ratios from 1:100 to 1:1,000 during the imaging process, illustrating the potential of latent space imaging for highly efficient imaging hardware, to enable future applications in high speed imaging, or task-specific cameras with substantially reduced hardware complexity.
Related papers
- Transferable polychromatic optical encoder for neural networks [13.311727599288524]
In this paper, we demonstrate an optical encoder that can perform convolution simultaneously in three color channels during the image capture.
Such an optical encoding results in 24,000 times reduction in computational operations, with a state-of-the art classification accuracy (73.2%) in free-space optical system.
arXiv Detail & Related papers (2024-11-05T00:49:47Z) - SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images [50.742420049839474]
'SaccadeDet' is an innovative architecture for gigapixel-level object detection, inspired by the human eye saccadic movement.
Our approach, evaluated on the PANDA dataset, achieves an 8x speed increase over the state-of-the-art methods.
It also demonstrates significant potential in gigapixel-level pathology analysis through its application to Whole Slide Imaging.
arXiv Detail & Related papers (2024-07-25T11:22:54Z) - Streaming quanta sensors for online, high-performance imaging and vision [34.098174669870126]
quanta image sensors (QIS) have demonstrated remarkable imaging capabilities in many challenging scenarios.
Despite their potential, the adoption of these sensors is severely hampered by (a) high data rates and (b) the need for new computational pipelines to handle the unconventional raw data.
We introduce a simple, low-bandwidth computational pipeline to address these challenges.
Our approach results in significant data bandwidth reductions 100X and real-time image reconstruction and computer vision.
arXiv Detail & Related papers (2024-06-02T20:30:49Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Learned Focused Plenoptic Image Compression with Microimage
Preprocessing and Global Attention [17.05466366805901]
Focused plenoptic cameras can record spatial and angular information of the light field (LF) simultaneously.
The existing plenoptic image compression methods present ineffectiveness to the captured images due to the complex micro-textures generated by the microlens relay imaging and long-distance correlations among the microimages.
A lossy end-to-end learning architecture is proposed to compress the focused plenoptic images efficiently.
arXiv Detail & Related papers (2023-04-30T14:24:56Z) - Ultrafast single-channel machine vision based on neuro-inspired photonic
computing [0.0]
Neuro-inspired photonic computing is a promising approach to speed-up machine vision processing with ultralow latency.
Here, we propose an image-sensor-free machine vision framework, which optically processes real-world visual information with only a single input channel.
We experimentally demonstrate that the proposed approach is capable of high-speed image recognition and anomaly detection, and furthermore, it can be used for high-speed imaging.
arXiv Detail & Related papers (2023-02-15T10:08:04Z) - Deep Learning for Ultrasound Beamforming [120.12255978513912]
Beamforming, the process of mapping received ultrasound echoes to the spatial image domain, lies at the heart of the ultrasound image formation chain.
Modern ultrasound imaging leans heavily on innovations in powerful digital receive channel processing.
Deep learning methods can play a compelling role in the digital beamforming pipeline.
arXiv Detail & Related papers (2021-09-23T15:15:21Z) - 10-mega pixel snapshot compressive imaging with a hybrid coded aperture [48.95666098332693]
High resolution images are widely used in our daily life, whereas high-speed video capture is challenging due to the low frame rate of cameras working at the high resolution mode.
snapshot imaging (SCI) was proposed as a solution to the low throughput of existing imaging systems.
arXiv Detail & Related papers (2021-06-30T01:09:24Z) - Exploiting Raw Images for Real-Scene Super-Resolution [105.18021110372133]
We study the problem of real-scene single image super-resolution to bridge the gap between synthetic data and real captured images.
We propose a method to generate more realistic training data by mimicking the imaging process of digital cameras.
We also develop a two-branch convolutional neural network to exploit the radiance information originally-recorded in raw images.
arXiv Detail & Related papers (2021-02-02T16:10:15Z) - Memory-efficient Learning for Large-scale Computational Imaging [3.255705667028885]
We propose a memory-efficient learning procedure that exploits the reversibility of the network's layers to enable data-driven design for large-scale imaging systems.
We demonstrate our method on a small-scale compressed sensing example, as well as two large-scale real-world systems: multi-channel magnetic resonance imaging and super-resolution optical microscopy.
arXiv Detail & Related papers (2020-03-11T23:08:04Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.