WIPES: Wavelet-based Visual Primitives
- URL: http://arxiv.org/abs/2508.12615v2
- Date: Tue, 19 Aug 2025 07:34:11 GMT
- Title: WIPES: Wavelet-based Visual Primitives
- Authors: Wenhao Zhang, Hao Zhu, Delong Wu, Di Kang, Linchao Bao, Xun Cao, Zhan Ma,
- Abstract summary: WIPES is a wavelet-based vIsual PrimitivES for representing multi-dimensional visual signals.<n>We show that WIPES offers higher rendering quality and faster inference than INR-based methods, and outperforms Gaussian-based representations in rendering quality.
- Score: 40.99041094491281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pursuing a continuous visual representation that offers flexible frequency modulation and fast rendering speed has recently garnered increasing attention in the fields of 3D vision and graphics. However, existing representations often rely on frequency guidance or complex neural network decoding, leading to spectrum loss or slow rendering. To address these limitations, we propose WIPES, a universal Wavelet-based vIsual PrimitivES for representing multi-dimensional visual signals. Building on the spatial-frequency localization advantages of wavelets, WIPES effectively captures both the low-frequency "forest" and the high-frequency "trees." Additionally, we develop a wavelet-based differentiable rasterizer to achieve fast visual rendering. Experimental results on various visual tasks, including 2D image representation, 5D static and 6D dynamic novel view synthesis, demonstrate that WIPES, as a visual primitive, offers higher rendering quality and faster inference than INR-based methods, and outperforms Gaussian-based representations in rendering quality.
Related papers
- One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion [57.824020826432815]
We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images.<n>We design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone.<n>We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process.
arXiv Detail & Related papers (2026-01-20T17:11:55Z) - FLAIR: Frequency- and Locality-Aware Implicit Neural Representations [13.614373731196272]
Implicit Representations (INRs) leverage neural networks to map coordinates to corresponding signals, enabling continuous and compact representations.<n>Existing INRs lack frequency selectivity, spatial localization, and sparse representations, leading to an over-reliance on redundant signal components.<n>We propose FLAIR (Frequency- and Locality-Aware Implicit Representations), which incorporates two key innovations.
arXiv Detail & Related papers (2025-08-19T06:06:04Z) - V2V3D: View-to-View Denoised 3D Reconstruction for Light-Field Microscopy [12.356249860549472]
Light field microscopy (LFM) has gained significant attention due to its ability to capture snapshot-based, large-scale 3D fluorescence images.<n>Existing LFM reconstruction algorithms are highly sensitive to sensor noise or require hard-to-get ground-truth annotated data for training.<n>This paper introduces V2V3D, an unsupervised view2view-based framework that establishes a new paradigm for joint optimization of image denoising and 3D reconstruction.
arXiv Detail & Related papers (2025-04-10T15:29:26Z) - WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation [0.5312470855079862]
We present WaveFormer, a novel 3D-transformer for medical images.<n>It is inspired by the top-down mechanism of the human visual recognition system.<n>It preserves both global context and high-frequency details while replacing heavy upsampling layers with efficient wavelet-based summarization and reconstruction.
arXiv Detail & Related papers (2025-03-31T06:28:41Z) - EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.<n>We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - FE-UNet: Frequency Domain Enhanced U-Net for Low-Frequency Information-Rich Image Segmentation [48.034848981295525]
We address the differences in frequency band sensitivity between CNNs and the human visual system.<n>We propose a wavelet adaptive spectrum fusion (WASF) method inspired by biological vision mechanisms to balance cross-frequency image features.<n>We develop the FE-UNet model, which employs a SAM2 backbone network and incorporates fine-tuned Hiera-Large modules to ensure segmentation accuracy.
arXiv Detail & Related papers (2025-02-06T07:24:34Z) - Anisotropic Neural Representation Learning for High-Quality Neural
Rendering [0.0]
We propose an anisotropic neural representation learning method that utilizes learnable view-dependent features to improve scene representation and reconstruction.
Our method is flexiable and can be plugged into NeRF-based frameworks.
arXiv Detail & Related papers (2023-11-30T07:29:30Z) - WaveNeRF: Wavelet-based Generalizable Neural Radiance Fields [149.2296890464997]
We design WaveNeRF, which integrates wavelet frequency decomposition into MVS and NeRF.
WaveNeRF achieves superior generalizable radiance field modeling when only given three images as input.
arXiv Detail & Related papers (2023-08-09T09:24:56Z) - Learning Neural Duplex Radiance Fields for Real-Time View Synthesis [33.54507228895688]
We propose a novel approach to distill and bake NeRFs into highly efficient mesh-based neural representations.
We demonstrate the effectiveness and superiority of our approach via extensive experiments on a range of standard datasets.
arXiv Detail & Related papers (2023-04-20T17:59:52Z) - Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos [69.22032459870242]
We present a novel technique, Residual Radiance Field or ReRF, as a highly compact neural representation to achieve real-time free-view rendering on long-duration dynamic scenes.
We show such a strategy can handle large motions without sacrificing quality.
Based on ReRF, we design a special FVV that achieves three orders of magnitudes compression rate and provides a companion ReRF player to support online streaming of long-duration FVVs of dynamic scenes.
arXiv Detail & Related papers (2023-04-10T08:36:00Z) - MVSNeRF: Fast Generalizable Radiance Field Reconstruction from
Multi-View Stereo [52.329580781898116]
We present MVSNeRF, a novel neural rendering approach that can efficiently reconstruct neural radiance fields for view synthesis.
Unlike prior works on neural radiance fields that consider per-scene optimization on densely captured images, we propose a generic deep neural network that can reconstruct radiance fields from only three nearby input views via fast network inference.
arXiv Detail & Related papers (2021-03-29T13:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.