Fugu-MT 論文翻訳(概要): Towards unified brain-to-text decoding across speech production and perception

論文の概要: Towards unified brain-to-text decoding across speech production and perception

arxiv url: http://arxiv.org/abs/2603.12628v1
Date: Fri, 13 Mar 2026 03:59:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-16 17:38:11.891459
Title: Towards unified brain-to-text decoding across speech production and perception
Title（参考訳）: 音声生成と知覚における脳とテキストの統一的復号化に向けて
Authors: Zhizhang Yuan, Yang Yang, Gaorui Zhang, Baowen Cheng, Zehan Wu, Yuhao Xu, Xiaoying Liu, Liang Chen, Ying Mao, Meng Li,
Abstract要約: 中国語における音声生成と知覚の両面に対して,脳から文への統一的なデコーディングフレームワークを提案する。このフレームワークは強力な一般化能力を示し、単一文字データでのみ訓練された場合の文レベルのデコードを可能にする。本研究は,統合復号化フレームワークの実現可能性を確立し,マンダリン音声の生成と知覚の神経特性に関する洞察を提供する。
参考スコア（独自算出の注目度）: 12.660399385706349
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Speech production and perception are the main ways humans communicate daily. Prior brain-to-text decoding studies have largely focused on a single modality and alphabetic languages. Here, we present a unified brain-to-sentence decoding framework for both speech production and perception in Mandarin Chinese. The framework exhibits strong generalization ability, enabling sentence-level decoding when trained only on single-character data and supporting characters and syllables unseen during training. In addition, it allows direct and controlled comparison of neural dynamics across modalities. Mandarin speech is decoded by first classifying syllable components in Hanyu Pinyin, namely initials and finals, from neural signals, followed by a post-trained large language model (LLM) that maps sequences of toneless Pinyin syllables to Chinese sentences. To enhance LLM decoding, we designed a three-stage post-training and two-stage inference framework based on a 7-billion-parameter LLM, achieving overall performance that exceeds larger commercial LLMs with hundreds of billions of parameters or more. In addition, several characteristics were observed in Mandarin speech production and perception: speech production involved neural responses across broader cortical regions than auditory perception; channels responsive to both modalities exhibited similar activity patterns, with speech perception showing a temporal delay relative to production; and decoding performance was broadly comparable across hemispheres. Our work not only establishes the feasibility of a unified decoding framework but also provides insights into the neural characteristics of Mandarin speech production and perception. These advances contribute to brain-to-text decoding in logosyllabic languages and pave the way toward neural language decoding systems supporting multiple modalities.
Abstract（参考訳）: 音声生成と知覚は、人間が毎日コミュニケーションをとる主要な方法である。脳からテキストへの復号化研究は、主に単一のモダリティとアルファベットの言語に焦点を合わせてきた。本稿では,中国語における音声生成と知覚の両面において,脳から文への統一的なデコーディングフレームワークを提案する。このフレームワークは強力な一般化能力を示し、単一文字データのみに基づいて訓練された場合の文レベルのデコーディングを可能にし、訓練中に見つからない文字や音節をサポートする。さらに、モジュラリティ間でのニューラルダイナミクスの直接的および制御された比較を可能にする。マンダリン音声は、初音と終音をニューラル信号から最初に分類し、その後、無音ピニイン音節の列を中国語の文にマッピングする訓練後の大言語モデル(LLM)でデコードされる。 LLMの復号化を図るため, 数十億以上のパラメータを持つ大型商用LCMを超越した総合的な性能を実現するために, 7ビリオンパラメトリックLSMに基づく3段階後・2段階推論フレームワークを設計した。さらに、マンダリン音声の発声と知覚にいくつかの特徴が見られた: 音声生成は、聴覚的知覚よりも広い皮質領域にわたる神経応答を伴い、両方のモダリティに応答するチャネルは類似した行動パターンを示し、発声に対する時間的遅延を示し、デコード性能は、半球全体で広く比較された。我々の研究は、統合デコードフレームワークの実現可能性を確立するだけでなく、マンダリン音声の生成と知覚の神経特性に関する洞察も提供する。これらの進歩は、ロゴ音節言語における脳からテキストへのデコーディングに寄与し、複数のモダリティをサポートするニューラル言語デコーディングシステムへの道を開いた。

論文の概要: Towards unified brain-to-text decoding across speech production and perception

関連論文リスト