Fugu-MT 論文翻訳(概要): UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

論文の概要: UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

arxiv url: http://arxiv.org/abs/2510.18915v3
Date: Thu, 30 Oct 2025 10:00:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-31 13:50:54.700005
Title: UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models
Title（参考訳）: UNO-Bench:OmniモデルにおけるUni-modalとOmni-modalの合成法則を探索するための統一ベンチマーク
Authors: Chen Chen, ZeYang Hu, Fengjiao Chen, Liya Ma, Jiaxing Liu, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai,
Abstract要約: 我々は,新しい,高品質で統一されたオムニモデルベンチマーク,UNO-Benchを紹介する。このベンチマークは、統一された能力分類の下で、UNi-modalとOmni-modalの両方の能力を効果的に評価するために設計されている。 1250人のオムニモダルの培養サンプルと98%のクロスモーダル可溶性、2480の強化されたユニモーダルサンプルを含んでいる。
参考スコア（独自算出の注目度）: 22.508414355245275
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal Large Languages models have been progressing from uni-modal understanding toward unifying visual, audio and language modalities, collectively termed omni models. However, the correlation between uni-modal and omni-modal remains unclear, which requires comprehensive evaluation to drive omni model's intelligence evolution. In this work, we introduce a novel, high-quality, and UNified Omni model benchmark, UNO-Bench. This benchmark is designed to effectively evaluate both UNi-modal and Omni-modal capabilities under a unified ability taxonomy, spanning 44 task types and 5 modality combinations. It includes 1250 human curated samples for omni-modal with 98% cross-modality solvability, and 2480 enhanced uni-modal samples. The human-generated dataset is well-suited to real-world scenarios, particularly within the Chinese context, whereas the automatically compressed dataset offers a 90% increase in speed and maintains 98% consistency across 18 public benchmarks. In addition to traditional multi-choice questions, we propose an innovative multi-step open-ended question format to assess complex reasoning. A general scoring model is incorporated, supporting 6 question types for automated evaluation with 95% accuracy. Experimental result shows the Compositional Law between omni-modal and uni-modal performance and the omni-modal capability manifests as a bottleneck effect on weak models, while exhibiting synergistic promotion on strong models.
Abstract（参考訳）: マルチモーダル大言語モデルは、一様理解から視覚、音声、言語モダリティの統一へと発展し、総称してオムニモデルと呼ばれている。しかし、ユニモーダルとオムニモーダルの相関は不明確であり、オムニモデルのインテリジェンス進化を促進するには包括的評価が必要である。本稿では,新しい,高品質,統一オムニモデルベンチマーク,UNO-Benchを紹介する。このベンチマークは、44のタスクタイプと5つのモダリティの組み合わせにまたがる、統一された能力分類の下で、UNiモダルとOmniモダルの両方の能力を効果的に評価するために設計されている。 1250人のオムニモダルの培養サンプルと98%のクロスモーダル可溶性、2480の強化されたユニモーダルサンプルを含んでいる。人為的なデータセットは、特に中国のコンテキストにおいて、現実世界のシナリオに適しているが、自動圧縮されたデータセットはスピードを90%向上させ、18の公開ベンチマークで98%の一貫性を維持している。従来の複数選択質問に加えて、複雑な推論を評価するために、革新的な複数段階のオープンエンド質問フォーマットを提案する。一般的なスコアリングモデルが組み込まれ、95%の精度で自動評価のための6つの質問タイプをサポートする。実験結果から, 弱モデルにおいて, 強モデル上での相乗的促進を示す一方で, 弱モデルにおいてオムニモダル性能とオムニモダル能力の組成則がボトルネック効果として現れることが明らかとなった。

論文の概要: UNO-Bench: A Unified Benchmark for Exploring the Compositional Law Between Uni-modal and Omni-modal in Omni Models

関連論文リスト