Fugu-MT 論文翻訳(概要): SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

論文の概要: SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

arxiv url: http://arxiv.org/abs/2308.11596v3
Date: Wed, 25 Oct 2023 03:52:07 GMT
ステータス: 翻訳完了
システム内更新日: 2023-10-26 20:02:42.566245
Title: SeamlessM4T: Massively Multilingual & Multimodal Machine Translation
Title（参考訳）: SeamlessM4T:多言語・多モーダル機械翻訳
Authors: Seamless Communication, Lo\"ic Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-juss\`a, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzm\'an, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang
Abstract要約: 音声から音声への翻訳,音声からテキストへの翻訳,テキストからテキストへの翻訳,最大100言語の自動音声認識をサポートする単一モデルSeamlessM4Tを紹介する。我々は、音声とテキストの両方に英語を翻訳できる最初の多言語システムを開発した。 FLEURSでは、SeamlessM4Tが複数のターゲット言語への翻訳の新しい標準を設定し、音声からテキストへの直接翻訳において、以前のSOTAよりも20%BLEUの改善を実現している。
参考スコア（独自算出の注目度）: 90.71078166159295
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication
Abstract（参考訳）: Babel Fishは、個人が2つの言語間で音声を翻訳するのを助けるツールだ。最近のテキストベースのモデルにおけるブレークスルーにより、200言語を超える機械翻訳のカバレッジが押し上げられたが、音声音声翻訳の統一モデルは、まだ同様の進歩を遂げていない。より具体的には、従来の音声音声翻訳システムは、段階的に翻訳を行うカスケードシステムに依存しており、高い性能の統一システムは到達できない。これらのギャップに対処するため,SamlessM4Tは音声音声翻訳,音声音声翻訳,テキスト音声翻訳,テキスト音声翻訳,最大100言語の自動音声認識をサポートする単一モデルである。そこで我々は,w2v-BERT 2.0を用いて,100万時間のオープン音声データを用いて自己教師型音声表現を学習した。その後,自動アライメント音声翻訳のマルチモーダルコーパスを作成した。人間のラベルデータと疑似ラベルデータとを合成し,音声とテキストの両方を英語に翻訳可能な最初の多言語システムを開発した。 FLEURSでは、SeamlessM4Tが複数のターゲット言語への翻訳の新しい標準を設定し、音声からテキストへの直接翻訳において、以前のSOTAよりも20%BLEUの改善を実現している。強いカスケードモデルと比較すると、seamlessm4tは英語内翻訳の品質を音声対テキストの1.3点、音声対音声の2.2.6点向上させる。強靭性テストにより,従来のSOTAモデルと比較して,背景雑音や話者の変動に優れた性能を示す。本研究は, ジェンダーバイアスに関するシームレスm4tを評価し, 翻訳の安全性を評価するために毒性を付加した。最後に、この作業へのすべてのコントリビューションはオープンソースであり、https://github.com/facebookresearch/seamless_lecommunicationsでアクセス可能である。

論文の概要: SeamlessM4T: Massively Multilingual & Multimodal Machine Translation

関連論文リスト