Fugu-MT 論文翻訳(概要): EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

論文の概要: EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

arxiv url: http://arxiv.org/abs/2605.28101v1
Date: Wed, 27 May 2026 07:54:47 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:55.861079
Title: EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction
Title（参考訳）: EigeNet:ビューRIR予測のための幾何インフォームドマルチモーダル学習
Authors: Chong Jing, Zitong Lan, Junan Zhang, Zhizheng Wu,
Abstract要約: 数ショットの新規ビューRIR予測のための幾何学インフォームド・マルチモーダル・フレームワークであるEIGENETを提案する。我々は、このアーキテクチャがマルチビューマルチモーダルコンテキストを完全に活用できることを実証的に実証した。 EigeNETは、数ショットのノベルビューRIR予測とsim-to-realの一般化の両方で最先端のパフォーマンスを実現している。
参考スコア（独自算出の注目度）: 5.156786627043761
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting spatially varying Room Impulse Response (RIR) from sparse observations is a critical but highly challenging inverse problem for immersive spatial audio rendering. In this work, we present EIGENET, a geometry-informed multi-modal framework for few-shot novel view RIR prediction. At its core is a Cross-view Alternate-attention Transformer that iteratively refines local intra-view acoustic structures and global cross-view spatial relationships. We empirically demonstrate that this architecture is capable of making full use of the multi-view multi-modal context while performing spatial-temporal reasoning for RIR prediction. Inspired by acoustic ray tracing, we design a geometry-informed modulation block to formulate the connection between geometric features and RIR power spectrum. In the mean time, an auxiliary loss is introduced to transform the single-target waveform prediction into a multi-task learning framework. Through ablation studies, we demonstrate that this design yields consistent performance gains regardless of the underlying backbone, thereby confirming its foundational utility and architecture-agnostic generalizability for RIR prediction task. Evaluated on both simulated and real-world benchmarks, EIGENET achieves both state-of-the-art performance in few-shot novel view RIR prediction and sim-to-real generalization. Codes and checkpoints are available on https://github.com/FEAfeatherTHER/EigeNet.
Abstract（参考訳）: 空間的に異なる空間インパルス応答(RIR)をスパース観測から予測することは、没入型空間オーディオレンダリングにおいて非常に困難な逆問題である。本稿では,幾何インフォームド・マルチモーダル・フレームワークであるEIGENETについて紹介する。中心となるのは、局所的な管内音響構造とグローバルな管内空間関係を反復的に洗練するクロスビュー・オルタナト・アテンション・トランスである。我々は、このアーキテクチャが、RIR予測のための空間的時間的推論を実行しながら、マルチビューマルチモーダルコンテキストを完全に活用できることを実証的に実証した。音響的レイトレーシングにインスパイアされ、幾何学的特徴とRIRパワースペクトルの接続を定式化する幾何学的インフォームド変調ブロックを設計する。平均すると、単一ターゲット波形予測をマルチタスク学習フレームワークに変換するために補助的損失を導入する。アブレーション研究を通じて、この設計は基礎となるバックボーンによらず一貫した性能向上をもたらすことを実証し、RIR予測タスクの基本的な実用性とアーキテクチャに依存しない一般化性を確認する。 EIGENETはシミュレーションと実世界のベンチマークの両方で評価され、数ショットのノベルビュー RIR 予測とシム・トゥ・リアルの一般化で最先端のパフォーマンスを達成している。コードとチェックポイントはhttps://github.com/FEAfeatherTHER/EigeNetで入手できる。

論文の概要: EigeNet: Geometry-Informed Multi-Modal Learning for Few-shot Novel View RIR Prediction

関連論文リスト