RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses
- URL: http://arxiv.org/abs/2602.01861v2
- Date: Tue, 03 Feb 2026 06:57:23 GMT
- Title: RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses
- Authors: Shaoheng Xu, Chunyi Sun, Jihui Zhang, Prasanga N. Samarasinghe, Thushara D. Abhayapala,
- Abstract summary: RIR-Former is a grid-free, one-step feed-forward model for RIR reconstruction.<n>By introducing a sinusoidal encoding module into a transformer backbone, our method effectively incorporates microphone position information.<n> Experiments on diverse simulated acoustic environments demonstrate that RIR-Former consistently outperforms state-of-the-art baselines.
- Score: 21.84404827658177
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Room impulse responses (RIRs) are essential for many acoustic signal processing tasks, yet measuring them densely across space is often impractical. In this work, we propose RIR-Former, a grid-free, one-step feed-forward model for RIR reconstruction. By introducing a sinusoidal encoding module into a transformer backbone, our method effectively incorporates microphone position information, enabling interpolation at arbitrary array locations. Furthermore, a segmented multi-branch decoder is designed to separately handle early reflections and late reverberation, improving reconstruction across the entire RIR. Experiments on diverse simulated acoustic environments demonstrate that RIR-Former consistently outperforms state-of-the-art baselines in terms of normalized mean square error (NMSE) and cosine distance (CD), under varying missing rates and array configurations. These results highlight the potential of our approach for practical deployment and motivate future work on scaling from randomly spaced linear arrays to complex array geometries, dynamic acoustic scenes, and real-world environments.
Related papers
- From Sparse Sensors to Continuous Fields: STRIDE for Spatiotemporal Reconstruction [3.2580743227673694]
We present STRIDE, a framework that maps high-dimensional spatial fields to a latent state with a temporaltemporal decoder.<n>We show that STRIDE supports super-resolution, supports super-resolution, and remains robust to noise.
arXiv Detail & Related papers (2026-02-04T04:39:23Z) - Rotation Equivariant Arbitrary-scale Image Super-Resolution [62.41329042683779]
The arbitrary-scale image super-resolution (ASISR) aims to achieve arbitrary-scale high-resolution recoveries from a low-resolution input image.<n>We make efforts to construct a rotation equivariant ASISR method in this study.
arXiv Detail & Related papers (2025-08-07T08:51:03Z) - DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models [16.92449230293275]
High-quality RIR estimates drive applications such as virtual microphones, sound source localization, augmented reality, and data augmentation.<n>This research addresses the challenge of estimating RIRs at unmeasured locations within a room using Denoising Diffusion Probabilistic Models (DDPM)
arXiv Detail & Related papers (2025-04-29T10:52:07Z) - SpINR: Neural Volumetric Reconstruction for FMCW Radars [0.15193212081459279]
We introduce SpINR, a novel framework for volumetric reconstruction using Frequency-Modulated Continuous-Wave (FMCW) radar data.<n>We demonstrate that SpINR significantly outperforms classical backprojection methods and existing learning-based approaches.
arXiv Detail & Related papers (2025-03-30T04:44:57Z) - NeuRBF: A Neural Fields Representation with Adaptive Radial Basis
Functions [93.02515761070201]
We present a novel type of neural fields that uses general radial bases for signal representation.
Our method builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals.
When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed.
arXiv Detail & Related papers (2023-09-27T06:32:05Z) - Synthetic Wave-Geometric Impulse Responses for Improved Speech
Dereverberation [69.1351513309953]
We show that accurately simulating the low-frequency components of Room Impulse Responses (RIRs) is important to achieving good dereverberation.
We demonstrate that speech dereverberation models trained on hybrid synthetic RIRs outperform models trained on RIRs generated by prior geometric ray tracing methods.
arXiv Detail & Related papers (2022-12-10T20:15:23Z) - K-Space Transformer for Fast MRIReconstruction with Implicit
Representation [39.04792898427536]
We propose a Transformer-based framework for processing sparsely sampled MRI signals in k-space.
We adopt an implicit representation of spectrogram, treating spatial coordinates as inputs, and dynamically query the partially observed measurements.
To strive a balance between computational cost and reconstruction quality, we build a hierarchical structure with low-resolution and high-resolution decoders respectively.
arXiv Detail & Related papers (2022-06-14T16:06:15Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer [60.27951773998535]
We propose a recurrent transformer model, namely textbfReconFormer, for MRI reconstruction.
It can iteratively reconstruct high fertility magnetic resonance images from highly under-sampled k-space data.
We show that it achieves significant improvements over the state-of-the-art methods with better parameter efficiency.
arXiv Detail & Related papers (2022-01-23T21:58:19Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.