PINs: Progressive Implicit Networks for Multi-Scale Neural
Representations
- URL: http://arxiv.org/abs/2202.04713v1
- Date: Wed, 9 Feb 2022 20:33:37 GMT
- Title: PINs: Progressive Implicit Networks for Multi-Scale Neural
Representations
- Authors: Zoe Landgraf, Alexander Sorkine Hornung, Ricardo Silveira Cabral
- Abstract summary: We propose a progressive positional encoding, exposing a hierarchical structure to incremental sets of frequency encodings.
Our model accurately reconstructs scenes with wide frequency bands and learns a scene representation at progressive level of detail.
Experiments on several 2D and 3D datasets show improvements in reconstruction accuracy, representational capacity and training speed compared to baselines.
- Score: 68.73195473089324
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-layer perceptrons (MLP) have proven to be effective scene encoders when
combined with higher-dimensional projections of the input, commonly referred to
as \textit{positional encoding}. However, scenes with a wide frequency spectrum
remain a challenge: choosing high frequencies for positional encoding
introduces noise in low structure areas, while low frequencies result in poor
fitting of detailed regions. To address this, we propose a progressive
positional encoding, exposing a hierarchical MLP structure to incremental sets
of frequency encodings. Our model accurately reconstructs scenes with wide
frequency bands and learns a scene representation at progressive level of
detail \textit{without explicit per-level supervision}. The architecture is
modular: each level encodes a continuous implicit representation that can be
leveraged separately for its respective resolution, meaning a smaller network
for coarser reconstructions. Experiments on several 2D and 3D datasets show
improvements in reconstruction accuracy, representational capacity and training
speed compared to baselines.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - Locality-Aware Generalizable Implicit Neural Representation [54.93702310461174]
Generalizable implicit neural representation (INR) enables a single continuous function to represent multiple data instances.
We propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder.
Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks.
arXiv Detail & Related papers (2023-10-09T11:26:58Z) - Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.
Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.
Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z) - Refined Semantic Enhancement towards Frequency Diffusion for Video
Captioning [29.617527535279574]
Video captioning aims to generate natural language sentences that describe the given video accurately.
Existing methods obtain favorable generation by exploring richer visual representations in encode phase or improving the decoding ability.
We introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens.
arXiv Detail & Related papers (2022-11-28T05:45:17Z) - GAN-Based Multi-View Video Coding with Spatio-Temporal EPI
Reconstruction [19.919826392704472]
We propose a novel multi-view video coding method that leverages the image generation capabilities of Generative Adrial Network (GAN)
At the encoder, we construct atemporal Epipolar Plane Image (EPI) decoder and further utilize a convolutional network to extract the latent code of a GAN as Side Information (SI)
At the side, we combine SI and adjacent viewpoints to reconstruct intermediate views using the GAN generator.
arXiv Detail & Related papers (2022-05-07T08:52:54Z) - Stage-Aware Feature Alignment Network for Real-Time Semantic
Segmentation of Street Scenes [59.81228011432776]
We present a novel Stage-aware Feature Alignment Network (SFANet) for real-time semantic segmentation of street scenes.
By taking into account the unique role of each stage in the decoder, a novel stage-aware Feature Enhancement Block (FEB) is designed to enhance spatial details and contextual information of feature maps from the encoder.
Experimental results show that the proposed SFANet exhibits a good balance between accuracy and speed for real-time semantic segmentation of street scenes.
arXiv Detail & Related papers (2022-03-08T11:46:41Z) - SpectralFormer: Rethinking Hyperspectral Image Classification with
Transformers [91.09957836250209]
Hyperspectral (HS) images are characterized by approximately contiguous spectral information.
CNNs have been proven to be a powerful feature extractor in HS image classification.
We propose a novel backbone network called ulSpectralFormer for HS image classification.
arXiv Detail & Related papers (2021-07-07T02:59:21Z) - A Hierarchical Coding Scheme for Glasses-free 3D Displays Based on
Scalable Hybrid Layered Representation of Real-World Light Fields [0.6091702876917279]
Scheme learns stacked multiplicative layers from subsets of light field views determined from different scanning orders.
The spatial correlation in layer patterns is exploited with varying low ranks in factorization derived from singular value decomposition on a Krylov subspace.
encoding with HEVC efficiently removes intra-view and inter-view correlation in low-rank approximated layers.
arXiv Detail & Related papers (2021-04-19T15:09:21Z) - Modulated Periodic Activations for Generalizable Local Functional
Representations [113.64179351957888]
We present a new representation that generalizes to multiple instances and achieves state-of-the-art fidelity.
Our approach produces general functional representations of images, videos and shapes, and achieves higher reconstruction quality than prior works that are optimized for a single signal.
arXiv Detail & Related papers (2021-04-08T17:59:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.