Fugu-MT 論文翻訳(概要): Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations

論文の概要: Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations

arxiv url: http://arxiv.org/abs/2604.23295v1
Date: Sat, 25 Apr 2026 13:18:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-28 17:12:07.257079
Title: Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations
Title（参考訳）: Human-1 by Josh Talks: 実世界会話を用いたヒンディー語における全二重会話モデリングフレームワーク
Authors: Bhaskar Singh, Shobhit Banga, Pranav Sharma,
Abstract要約: 完全な音声対話システムは、割り込みやバックチューニングといった自然な会話動作をモデル化することができる。この研究はヒンディー語や他のインドの言語のためのリアルタイム音声対話システムに向けた第一歩となる。
参考スコア（独自算出の注目度）: 1.218012293738896
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Full-duplex spoken dialogue systems can model natural conversational behaviours such as interruptions, overlaps, and backchannels, yet such systems remain largely unexplored for Indian languages. We present the first open, reproducible full-duplex spoken dialogue system for Hindi by adapting Moshi, a state-of-the-art duplex speech architecture, using a custom Hindi tokeniser and training on 26,000 hours of real spontaneous conversations collected from 14,695 speakers with separate speaker channels, enabling direct learning of turn-taking and overlap patterns from natural interactions. To support Hindi text generation, we replace the original English tokeniser and reinitialise text-vocabulary-dependent parameters while retaining the pre-trained audio components. We propose a two-stage training recipe -- large-scale pre-training followed by fine-tuning on 1,000 hours of conversational data. Evaluation through the prompted dialogue continuation paradigm with both automatic metrics and human judgments demonstrates that the resulting model generates natural and meaningful full-duplex conversational behaviour in Hindi. This work serves as a first step toward real-time duplex spoken dialogue systems for Hindi and other Indian languages.
Abstract（参考訳）: 全二重音声対話システムは、割り込み、重複、バックチャネルなどの自然な会話動作をモデル化することができるが、インド語ではそのようなシステムがほとんど探索されていない。我々は,ヒンディー語話者14,695人の話者から収集された26,000時間の実自然会話を学習し,自然対話からのターンテイクと重複パターンの直接学習を可能にし,ヒンディー語に対する最初のオープンかつ再現可能な全二重音声対話システムを提案する。ヒンディー語のテキスト生成をサポートするために、オリジナル英語のトークンサを置き換え、事前学習された音声成分を保持しながら、テキスト語彙に依存したパラメータを再起動する。大規模な事前トレーニングと,1,000時間の会話データによる微調整という,2段階のトレーニングレシピを提案する。自動測定と人的判断を併用した対話継続パラダイムによる評価は,ヒンディー語における自然かつ有意義な会話行動を生成することを示す。この研究は、ヒンディー語や他のインドの言語のためのリアルタイム二重音声対話システムに向けた第一歩となる。

論文の概要: Human-1 by Josh Talks: A Full-Duplex Conversational Modeling Framework in Hindi using Real-World Conversations

関連論文リスト