SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision
- URL: http://arxiv.org/abs/2504.04535v1
- Date: Sun, 06 Apr 2025 16:24:45 GMT
- Title: SnapPix: Efficient-Coding--Inspired In-Sensor Compression for Edge Vision
- Authors: Weikai Lin, Tianrui Ma, Adith Boloor, Yu Feng, Ruofan Xing, Xuan Zhang, Yuhao Zhu,
- Abstract summary: Energy-efficient image acquisition on the edge is crucial for enabling remote sensing applications.<n>This paper proposes a sensor-algorithm co-designed system called SnapPix, which compresses raw pixels in the analog domain inside the sensor.
- Score: 10.880533232888412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Energy-efficient image acquisition on the edge is crucial for enabling remote sensing applications where the sensor node has weak compute capabilities and must transmit data to a remote server/cloud for processing. To reduce the edge energy consumption, this paper proposes a sensor-algorithm co-designed system called SnapPix, which compresses raw pixels in the analog domain inside the sensor. We use coded exposure (CE) as the in-sensor compression strategy as it offers the flexibility to sample, i.e., selectively expose pixels, both spatially and temporally. SNAPPIX has three contributions. First, we propose a task-agnostic strategy to learn the sampling/exposure pattern based on the classic theory of efficient coding. Second, we co-design the downstream vision model with the exposure pattern to address the pixel-level non-uniformity unique to CE-compressed images. Finally, we propose lightweight augmentations to the image sensor hardware to support our in-sensor CE compression. Evaluating on action recognition and video reconstruction, SnapPix outperforms state-of-the-art video-based methods at the same speed while reducing the energy by up to 15.4x. We have open-sourced the code at: https://github.com/horizon-research/SnapPix.
Related papers
- Embedding Compression Distortion in Video Coding for Machines [67.97469042910855]
Currently, video transmission serves not only the Human Visual System (HVS) for viewing but also machine perception for analysis.
We propose a Compression Distortion Embedding (CDRE) framework, which extracts machine-perception-related distortion representation and embeds it into downstream models.
Our framework can effectively boost the rate-task performance of existing codecs with minimal overhead in terms of execution time, and number of parameters.
arXiv Detail & Related papers (2025-03-27T13:01:53Z) - bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction [57.199618102578576]
We propose bit2bit, a new method for reconstructing high-quality image stacks at original resolution from sparse binary quantatemporal image data.
Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data.
We present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions.
arXiv Detail & Related papers (2024-10-30T17:30:35Z) - Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a
Light-Weight ToF Sensor [58.305341034419136]
We present the first dense SLAM system with a monocular camera and a light-weight ToF sensor.
We propose a multi-modal implicit scene representation that supports rendering both the signals from the RGB camera and light-weight ToF sensor.
Experiments demonstrate that our system well exploits the signals of light-weight ToF sensors and achieves competitive results.
arXiv Detail & Related papers (2023-08-28T07:56:13Z) - Real-Time Radiance Fields for Single-Image Portrait View Synthesis [85.32826349697972]
We present a one-shot method to infer and render a 3D representation from a single unposed image in real-time.
Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering.
Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization.
arXiv Detail & Related papers (2023-05-03T17:56:01Z) - PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized
Perception with Neural Sensors [42.18718773182277]
Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing.
We develop an efficient recurrent neural network architecture, processing PixelRNN, that encodes-temporal features on the sensor using purely binary operations.
PixelRNN reduces the amount data to be transmitted off the sensor by a factor of 64x compared to conventional systems while offering competitive accuracy for hand gesture recognition and lip reading tasks.
arXiv Detail & Related papers (2023-04-11T18:16:47Z) - A direct time-of-flight image sensor with in-pixel surface detection and
dynamic vision [0.0]
3D flash LIDAR is an alternative to the traditional scanning LIDAR systems, promising precise depth imaging in a compact form factor.
We present a 64x32 pixel (256x128 SPAD) dToF imager that overcomes these limitations by using pixels with embedded histogramming.
This reduces the size of output data frames considerably, enabling maximum frame rates in the 10 kFPS range or 100 kFPS for direct depth readings.
arXiv Detail & Related papers (2022-09-23T14:38:00Z) - A photosensor employing data-driven binning for ultrafast image
recognition [0.0]
Pixel binning is a technique widely used in optical image acquisition and spectroscopy.
Here, we push the concept of binning to its limit by combining a large fraction of the sensor elements into a single superpixel.
For a given pattern recognition task, its optimal shape is determined from training data using a machine learning algorithm.
arXiv Detail & Related papers (2021-11-20T15:38:39Z) - Small Lesion Segmentation in Brain MRIs with Subpixel Embedding [105.1223735549524]
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues.
We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network.
arXiv Detail & Related papers (2021-09-18T00:21:17Z) - Time-Multiplexed Coded Aperture Imaging: Learned Coded Aperture and
Pixel Exposures for Compressive Imaging Systems [56.154190098338965]
We show that our proposed time multiplexed coded aperture (TMCA) can be optimized end-to-end.
TMCA induces better coded snapshots enabling superior reconstructions in two different applications: compressive light field imaging and hyperspectral imaging.
This codification outperforms the state-of-the-art compressive imaging systems by more than 4dB in those applications.
arXiv Detail & Related papers (2021-04-06T22:42:34Z) - Plug-and-Play Algorithms for Video Snapshot Compressive Imaging [41.818167109996885]
We consider the reconstruction problem of snapshot video imaging (SCI) using a low-speed 2D sensor (detector)
The underlying principle SCI is to modulate frames with different masks and then encoded frames are integrated into a snapshot on the sensor.
Applying SCI to largescale problems (HD or UHD videos) in our daily life is still challenging one bottlenecks lies in the reconstruction algorithm.
arXiv Detail & Related papers (2021-01-13T00:51:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.