Related papers: gQIR: Generative Quanta Image Reconstruction

gQIR: Generative Quanta Image Reconstruction

URL: http://arxiv.org/abs/2602.20417v1
Date: Mon, 23 Feb 2026 23:33:00 GMT
Title: gQIR: Generative Quanta Image Reconstruction
Authors: Aryan Garg, Sizhuo Ma, Mohit Gupta,
Abstract summary: We present an approach that adapts large text-to-image latent diffusion models to the photon-limited domain of quanta burst imaging.<n>By integrating latent-space restoration with burst-level-temporal reasoning, our approach produces reconstructions that are both photometrically faithful and perceptually pleasing.
Score: 18.400282448827507
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Capturing high-quality images from only a few detected photons is a fundamental challenge in computational imaging. Single-photon avalanche diode (SPAD) sensors promise high-quality imaging in regimes where conventional cameras fail, but raw \emph{quanta frames} contain only sparse, noisy, binary photon detections. Recovering a coherent image from a burst of such frames requires handling alignment, denoising, and demosaicing (for color) under noise statistics far outside those assumed by standard restoration pipelines or modern generative models. We present an approach that adapts large text-to-image latent diffusion models to the photon-limited domain of quanta burst imaging. Our method leverages the structural and semantic priors of internet-scale diffusion models while introducing mechanisms to handle Bernoulli photon statistics. By integrating latent-space restoration with burst-level spatio-temporal reasoning, our approach produces reconstructions that are both photometrically faithful and perceptually pleasing, even under high-speed motion. We evaluate the method on synthetic benchmarks and new real-world datasets, including the first color SPAD burst dataset and a challenging \textit{Deforming (XD)} video benchmark. Across all settings, the approach substantially improves perceptual quality over classical and modern learning-based baselines, demonstrating the promise of adapting large generative priors to extreme photon-limited sensing. Code at \href{https://github.com/Aryan-Garg/gQIR}{https://github.com/Aryan-Garg/gQIR}.

Related papers

Latent Forcing: Reordering the Diffusion Trajectory for Pixel-Space Image Generation [36.41177812868683]
Latent diffusion models excel at generating high-quality images but lose the benefits of end-to-end modeling.<n>We propose Latent Forcing, a simple modification to existing architectures that achieves the efficiency of latent diffusion while operating on raw natural images.<n>Latent Forcing achieves a new state-of-the-art for diffusion transformer-based pixel generation at our compute scale.
arXiv Detail & Related papers (2026-02-11T22:09:58Z)
LensNet: An End-to-End Learning Framework for Empirical Point Spread Function Modeling and Lensless Imaging Reconstruction [32.85180149439811]
Lensless imaging stands out as a promising alternative to conventional lens-based systems.<n>Traditional lensless techniques often require explicit calibrations and extensive pre-processing.<n>We propose LensNet, an end-to-end deep learning framework that integrates spatial-domain and frequency-domain representations.
arXiv Detail & Related papers (2025-05-03T09:11:52Z)
bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction [57.199618102578576]
We propose bit2bit, a new method for reconstructing high-quality image stacks at original resolution from sparse binary quantatemporal image data. Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data. We present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions.
arXiv Detail & Related papers (2024-10-30T17:30:35Z)
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream [26.165424006344267]
Spike cameras offer distinct advantages over standard cameras. Existing approaches reliant on spike cameras often assume optimal illumination. We introduce SpikeNeRF, the first work that derives a NeRF-based volumetric scene representation from spike camera data.
arXiv Detail & Related papers (2024-03-17T13:51:25Z)
LDM-ISP: Enhancing Neural ISP for Low Light with Latent Diffusion Models [54.93010869546011]
We propose to leverage the pre-trained latent diffusion model to perform the neural ISP for enhancing extremely low-light images.<n>Specifically, to tailor the pre-trained latent diffusion model to operate on the RAW domain, we train a set of lightweight taming modules.<n>We observe different roles of UNet denoising and decoder reconstruction in the latent diffusion model, which inspires us to decompose the low-light image enhancement task into latent-space low-frequency content generation and decoding-phase high-frequency detail maintenance.
arXiv Detail & Related papers (2023-12-02T04:31:51Z)
On quantifying and improving realism of images generated with diffusion [50.37578424163951]
We propose a metric, called Image Realism Score (IRS), computed from five statistical measures of a given image. IRS is easily usable as a measure to classify a given image as real or fake. We experimentally establish the model- and data-agnostic nature of the proposed IRS by successfully detecting fake images generated by Stable Diffusion Model (SDM), Dalle2, Midjourney and BigGAN. Our efforts have also led to Gen-100 dataset, which provides 1,000 samples for 100 classes generated by four high-quality models.
arXiv Detail & Related papers (2023-09-26T08:32:55Z)
InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering [55.70938412352287]
We present an information-theoretic regularization technique for few-shot novel view synthesis based on neural implicit representation. The proposed approach minimizes potential reconstruction inconsistency that happens due to insufficient viewpoints. We achieve consistently improved performance compared to existing neural view synthesis methods by large margins on multiple standard benchmarks.
arXiv Detail & Related papers (2021-12-31T11:56:01Z)
Unsupervised Single Image Super-resolution Under Complex Noise [60.566471567837574]
This paper proposes a model-based unsupervised SISR method to deal with the general SISR task with unknown degradations. The proposed method can evidently surpass the current state of the art (SotA) method (about 1dB PSNR) not only with a slighter model (0.34M vs. 2.40M) but also faster speed.
arXiv Detail & Related papers (2021-07-02T11:55:40Z)
Single Image Brightening via Multi-Scale Exposure Fusion with Hybrid Learning [48.890709236564945]
A small ISO and a small exposure time are usually used to capture an image in the back or low light conditions. In this paper, a single image brightening algorithm is introduced to brighten such an image. The proposed algorithm includes a unique hybrid learning framework to generate two virtual images with large exposure times.
arXiv Detail & Related papers (2020-07-04T08:23:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.