Fugu-MT 論文翻訳(概要): Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

論文の概要: Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

arxiv url: http://arxiv.org/abs/2603.27295v1
Date: Sat, 28 Mar 2026 14:57:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.884426
Title: Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes
Title（参考訳）: Beyond Descriptions:Vistaのランドスケープを体験するためのBlind and Low-Visionユーザのための生成Scene2Audioフレームワーク
Authors: Chitralekha Gupta, Jing Peng, Ashwin Ram, Shreyas Sridhar, Christophe Jouffrais, Suranga Nanayakkara,
Abstract要約: 提案するScene2Audioフレームワークは,心理音響学から情報を得た生成モデルを用いて,理解しやすく,楽しめる非言語音声を生成する。我々の研究は、純粋に記述的な援助を超えて、視覚と聴覚のシーン知覚のギャップを埋める。
参考スコア（独自算出の注目度）: 23.925773831218027
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Current scene perception tools for Blind and Low Vision (BLV) individuals rely on spoken descriptions but lack engaging representations of visually pleasing distant environmental landscapes (Vista spaces). Our proposed Scene2Audio framework generates comprehensible and enjoyable nonverbal audio using generative models informed by psychoacoustics, and principles of scene audio composition. Through a user study with 11 BLV participants, we found that combining the Scene2Audio sounds with speech creates a better experience than speech alone, as the sound effects complement the speech making the scene easier to imagine. A mobile app "in-the-wild" study with 7 BLV users for more than a week further showed the potential of Scene2Audio in enhancing outdoor scene experiences. Our work bridges the gap between visual and auditory scene perception by moving beyond purely descriptive aids, addressing the aesthetic needs of BLV users.
Abstract（参考訳）: 現在のBlind and Low Vision(BLV)個人のためのシーン認識ツールは、音声による記述に依存しているが、視覚的に離れた環境景観(Vista空間)を満足させるような表現は欠如している。提案するScene2Audioフレームワークは、心理音響学から情報を得た生成モデルとシーン音声合成の原理を用いて、理解しやすく楽しめる非言語音声を生成する。 11人のBLV参加者によるユーザスタディにより、Scene2Audioの音声と音声を組み合わせることで、音声効果が音声を補完するので、音声単独よりも優れた体験が得られることがわかった。 7人のBLVユーザーを対象に1週間以上調査したモバイルアプリは、アウトドアシーン体験を向上するScene2Audioの可能性を示している。 BLV利用者の美的ニーズに対処するため、純粋に記述的な援助を超えて視覚と聴覚のシーン知覚のギャップを埋める作業を行った。

論文の概要: Beyond Descriptions: A Generative Scene2Audio Framework for Blind and Low-Vision Users to Experience Vista Landscapes

関連論文リスト