Fugu-MT 論文翻訳(概要): SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia

論文の概要: SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia

arxiv url: http://arxiv.org/abs/2511.01670v1
Date: Mon, 03 Nov 2025 15:32:58 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:27.311362
Title: SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
Title（参考訳）: SeaLLMs-Audio:東南アジア向け大規模オーディオ言語モデル
Authors: Chaoqun Liu, Mahani Aljunied, Guizhen Chen, Hou Pong Chan, Weiwen Xu, Yu Rong, Wenxuan Zhang,
Abstract要約: 東南アジアの複数の言語に対応する最初の大規模音声言語モデル(LALM)であるSeaLLMs-Audioを紹介する。 SeaLLMs-Audioは、様々なオーディオ中心のタスクにまたがって、きめ細かい音声理解と音声ベースのインタラクションに強いパフォーマンスを示す。音声キャプション、自動音声認識、音声からテキストへの翻訳、音声感情認識、音声質問回答、音声要約など、幅広いタスクをサポートする。
参考スコア（独自算出の注目度）: 40.53123362174684
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We introduce SeaLLMs-Audio, the first large audio-language model (LALM) tailored for multiple Southeast Asian (SEA) languages-Indonesian (id), Thai (th), and Vietnamese (vi)-alongside English (en) and Chinese (zh). Trained on a large-scale audio corpus, SeaLLMs-Audio exhibits strong performance across diverse audio-centric tasks, spanning fine-grained audio understanding and voice-based interaction. Its key features include: 1) Multilingual: the model primarily supports 5 languages, namely Indonesian, Thai, Vietnamese, English, and Chinese; 2) Multimodal: the model accepts flexible input modalities, including audio only, text only, as well as audio with text; 3) Multi-task: the model supports a wide range of tasks, including audio analysis tasks such as Audio Captioning, Automatic Speech Recognition, Speech-to-Text Translation, Speech Emotion Recognition, Speech Question Answering, and Speech Summarization. It also enables voice-based dialogue, including answering factual, mathematical, and general knowledge queries. As a significant step towards advancing audio LLMs in Southeast Asia, we expect SeaLLMs-Audio to benefit both the regional research community and industry. To automate LALM evaluation for Southeast Asia, we introduce SeaBench-Audio, a benchmark spanning multiple tasks. Experiments show that SeaLLMs-Audio achieves competitive performance compared with other LALMs on SEA languages.
Abstract（参考訳）: SeaLLMs-Audioは,複数の東南アジア (SEA) 言語(インドネシア語 (id), タイ語 (th), ベトナム語 (vi) の英語 (en) と中国語 (zh) に合わせた最初の大規模音声言語モデルである。大規模なオーディオコーパスでトレーニングされたSeaLLMs-Audioは、さまざまなオーディオ中心のタスクにまたがって、きめ細かい音声理解と音声ベースのインタラクションに強いパフォーマンスを示す。主な特徴は以下のとおりである。 1)多言語: 主にインドネシア語,タイ語,ベトナム語,英語,中国語の5言語をサポートする。 2)マルチモーダル: モデルは、音声のみ、テキストのみ、およびテキスト付き音声を含む柔軟な入力モダリティを受け入れる。 3)マルチタスク: 音声キャプション,自動音声認識,音声からテキストへの翻訳,音声感情認識,音声質問応答,音声要約など,幅広いタスクをサポートする。また、事実、数学的、一般的な知識クエリへの回答を含む音声ベースの対話も可能である。東南アジアにおけるオーディオLLMの進歩に向けた重要なステップとして、SeaLLMs-Audioが地域研究コミュニティと産業の両方に利益をもたらすことを期待している。東南アジアにおけるLALM評価を自動化するため,複数のタスクにまたがるベンチマークであるSeaBench-Audioを導入する。実験の結果,SeaLLMs-AudioはSEA言語上の他のLALMと比較して競争性能が向上していることがわかった。

論文の概要: SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia

関連論文リスト