Fugu-MT 論文翻訳(概要): Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

論文の概要: Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

arxiv url: http://arxiv.org/abs/2603.16606v1
Date: Tue, 17 Mar 2026 14:47:35 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.351508
Title: Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech
Title（参考訳）: 対訳 SONAR----------------------------
Authors: Omnilingual SONAR Team, João Maria Janeiro, Pere-Lluís Huguet Cabot, Ioannis Tsiamas, Yen Meng, Vivek Iyer, Guillem Ramírez, Loic Barrault, Belen Alastruey, Yu-An Chung, Marta R. Costa-Jussa, David Dale, Kevin Heffernan, Jaehyeong Jo, Artyom Kozhevnikov, Alexandre Mourachko, Christophe Ropers, Holger Schwenk, Paul-Ambroise Duquenne,
Abstract要約: 言語間の文エンコーダは通常、数百の言語をカバーしている。我々はOmniSONARを紹介した。OmniSONARは全言語、言語横断、言語横断の文埋め込みモデルである。
参考スコア（独自算出の注目度）: 61.759910921200834
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Cross-lingual sentence encoders typically cover only a few hundred languages and often trade downstream quality for stronger alignment, limiting their adoption. We introduce OmniSONAR, a new family of omnilingual, cross-lingual and cross-modal sentence embedding models that natively embed text, speech, code, and mathematical expressions in a single semantic space, while delivering state-of-the-art downstream performance at the scale of thousands of languages, from high-resource to extremely low-resource varieties. To reach this scale without representation collapse, we use progressive training. We first learn a strong foundational space for 200 languages with an LLM-initialized encoder-decoder, combining token-level decoding with a novel split-softmax contrastive loss and synthetic hard negatives. Building on this foundation, we expand to several thousands language varieties via a two-stage teacher-student encoder distillation framework. Finally, we demonstrate the cross-modal extensibility of this space by seamlessly mapping 177 spoken languages into it. OmniSONAR halves cross-lingual similarity search error on the 200-language FLORES dataset and reduces error by a factor of 15 on the 1,560-language BIBLE benchmark. It also enables strong translation, outperforming NLLB-3B on multilingual benchmarks and exceeding prior models (including much larger LLMs) by 15 chrF++ points on 1,560 languages into English BIBLE translation. OmniSONAR also performs strongly on MTEB and XLCoST. For speech, OmniSONAR achieves a 43% lower similarity-search error and reaches 97% of SeamlessM4T speech-to-text quality, despite being zero-shot for translation (trained only on ASR data). Finally, by training an encoder-decoder LM, Spectrum, exclusively on English text processing OmniSONAR embedding sequences, we unlock high-performance transfer to thousands of languages and speech for complex downstream tasks.
Abstract（参考訳）: 言語間の文エンコーダは通常、数百の言語をカバーし、しばしば下流の品質をより強力なアライメントのために交換し、採用を制限する。 OmniSONARは,テキスト,音声,コード,数学的表現を1つの意味空間にネイティブに組み込んだ,全言語,クロスランガル,クロスモーダルの文埋め込みモデルである。表現の崩壊なしにこの規模に達するには、プログレッシブトレーニングを使用します。まず, LLM初期化エンコーダデコーダを用いた200言語に対して, トークンレベルのデコードと, 新規なスプリットソフトマックスコントラスト損失と合成ハードネガティブを組み合わせ, 強力な基礎空間を学習する。この基礎の上に構築され、2段階の教師によるエンコーダ蒸留フレームワークを通じて数千の言語品種に拡張する。最後に、177の音声言語をシームレスにマッピングすることで、この空間のクロスモーダル拡張性を実証する。 OmniSONARは200言語FLORESデータセットの言語間類似性検索エラーをハーフし、1,560言語BIBLEベンチマークの15倍の誤差を減少させる。また、多言語ベンチマークでNLLB-3Bを上回り、1,560言語で15 chrF++ポイントの先行モデル(LLMを含む)を英語のBIBLE翻訳に上回り、強力な翻訳を可能にする。 OmniSONARはMTEBとXLCoSTに強く依存する。音声の場合、OmniSONARは43%低い類似性検索誤差を達成し、翻訳のためのゼロショットであるにもかかわらず、SeamlessM4T音声品質の97%に達する。最後に、エンコーダデコーダLMであるSpectrumを英語のテキスト処理専用のOmniSONAR埋め込みシーケンスでトレーニングすることにより、数千の言語へのハイパフォーマンス転送と、複雑な下流タスクのための音声を解放する。

論文の概要: Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

関連論文リスト