AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
- URL: http://arxiv.org/abs/2503.12806v1
- Date: Mon, 17 Mar 2025 04:22:53 GMT
- Title: AV-Surf: Surface-Enhanced Geometry-Aware Novel-View Acoustic Synthesis
- Authors: Hadam Baek, Hannie Shin, Jiyoung Seo, Chanwoo Kim, Saerom Kim, Hyeongbok Kim, Sangpil Kim,
- Abstract summary: Accurately modeling sound propagation with complex real-world environments is essential for Novel View Acoustic Synthesis (NVAS)<n>We propose a surface-enhanced geometry-aware approach for NVAS to improve spatial acoustic modeling.<n>We introduce a dual cross-attention-based transformer integrating geometrical constraints into frequency query to understand the surroundings of the emitter.
- Score: 4.751910547396398
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Accurately modeling sound propagation with complex real-world environments is essential for Novel View Acoustic Synthesis (NVAS). While previous studies have leveraged visual perception to estimate spatial acoustics, the combined use of surface normal and structural details from 3D representations in acoustic modeling has been underexplored. Given their direct impact on sound wave reflections and propagation, surface normals should be jointly modeled with structural details to achieve accurate spatial acoustics. In this paper, we propose a surface-enhanced geometry-aware approach for NVAS to improve spatial acoustic modeling. To achieve this, we exploit geometric priors, such as image, depth map, surface normals, and point clouds obtained using a 3D Gaussian Splatting (3DGS) based framework. We introduce a dual cross-attention-based transformer integrating geometrical constraints into frequency query to understand the surroundings of the emitter. Additionally, we design a ConvNeXt-based spectral features processing network called Spectral Refinement Network (SRN) to synthesize realistic binaural audio. Experimental results on the RWAVS and SoundSpace datasets highlight the necessity of our approach, as it surpasses existing methods in novel view acoustic synthesis.
Related papers
- EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization.
We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z) - AniSDF: Fused-Granularity Neural Surfaces with Anisotropic Encoding for High-Fidelity 3D Reconstruction [55.69271635843385]
We present AniSDF, a novel approach that learns fused-granularity neural surfaces with physics-based encoding for high-fidelity 3D reconstruction.
Our method boosts the quality of SDF-based methods by a great scale in both geometry reconstruction and novel-view synthesis.
arXiv Detail & Related papers (2024-10-02T03:10:38Z) - AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis [62.33446681243413]
view acoustic synthesis aims to render audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene.<n>Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition for synthesizing audio.<n>We propose a novel Audio-Visual Gaussian Splatting (AV-GS) model to characterize the entire scene environment.<n>Experiments validate the superiority of our AV-GS over existing alternatives on the real-world RWAS and simulation-based SoundSpaces datasets.
arXiv Detail & Related papers (2024-06-13T08:34:12Z) - X-Ray: A Sequential 3D Representation For Generation [54.160173837582796]
We introduce X-Ray, a novel 3D sequential representation inspired by x-ray scans.
X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images.
arXiv Detail & Related papers (2024-04-22T16:40:11Z) - Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior [29.120669908374424]
We introduce a novel audio-driven talking head synthesis framework, called Talk3D.
It can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior.
Compared to existing methods, our method excels in generating realistic facial geometries even under extreme head poses.
arXiv Detail & Related papers (2024-03-29T12:49:40Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Q-SLAM: Quadric Representations for Monocular SLAM [85.82697759049388]
We reimagine volumetric representations through the lens of quadrics.
We use quadric assumption to rectify noisy depth estimations from RGB inputs.
We introduce a novel quadric-decomposed transformer to aggregate information across quadrics.
arXiv Detail & Related papers (2024-03-12T23:27:30Z) - VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis [73.50359502037232]
VoxNeRF is a novel approach to enhance the quality and efficiency of neural indoor reconstruction and novel view synthesis.<n>We propose an efficient voxel-guided sampling technique that allocates computational resources to selectively the most relevant segments of rays.<n>Our approach is validated with extensive experiments on ScanNet and ScanNet++.
arXiv Detail & Related papers (2023-11-09T11:32:49Z) - Neural Implicit Surface Reconstruction using Imaging Sonar [38.73010653104763]
We present a technique for dense 3D reconstruction of objects using an imaging sonar, also known as forward-looking sonar (FLS)
Compared to previous methods that model the scene geometry as point clouds or volumetric grids, we represent geometry as a neural implicit function.
We perform experiments on real and synthetic datasets and show that our algorithm reconstructs high-fidelity surface geometry from multi-view FLS images at much higher quality than was possible with previous techniques and without suffering from their associated memory overhead.
arXiv Detail & Related papers (2022-09-17T02:23:09Z) - Implicit Neural Representation Learning for Hyperspectral Image
Super-Resolution [0.0]
Implicit Neural Representations (INRs) are making strides as a novel and effective representation.
We propose a novel HSI reconstruction model based on INR which represents HSI by a continuous function mapping a spatial coordinate to its corresponding spectral radiance values.
arXiv Detail & Related papers (2021-12-20T14:07:54Z) - Polka Lines: Learning Structured Illumination and Reconstruction for
Active Stereo [52.68109922159688]
We introduce a novel differentiable image formation model for active stereo, relying on both wave and geometric optics, and a novel trinocular reconstruction network.
The jointly optimized pattern, which we dub "Polka Lines," together with the reconstruction network, achieve state-of-the-art active-stereo depth estimates across imaging conditions.
arXiv Detail & Related papers (2020-11-26T04:02:43Z) - Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural
Networks [10.089520556398574]
We present a new single sound source DOA estimation and tracking system based on the SRP-PHAT algorithm and a three-dimensional Convolutional Neural Network.
It uses SRP-PHAT power maps as input features of a fully convolutional causal architecture that uses 3D convolutional layers to accurately perform the tracking of a sound source.
arXiv Detail & Related papers (2020-06-16T09:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.