Fugu-MT 論文翻訳(概要): On Path to Multimodal Generalist: General-Level and General-Bench

論文の概要: On Path to Multimodal Generalist: General-Level and General-Bench

arxiv url: http://arxiv.org/abs/2505.04620v1
Date: Wed, 07 May 2025 17:59:32 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-08 19:07:36.183938
Title: On Path to Multimodal Generalist: General-Level and General-Bench
Title（参考訳）: マルチモーダル・ジェネラリストへの道--ジェネラル・レヴェルとジェネラル・ベンチ
Authors: Hao Fei, Yuan Zhou, Juncheng Li, Xiangtai Li, Qingshan Xu, Bobo Li, Shengqiong Wu, Yaoting Wang, Junbao Zhou, Jiahao Meng, Qingyu Shi, Zhiyuan Zhou, Liangtao Shi, Minghe Gao, Daoan Zhang, Zhiqi Ge, Weiming Wu, Siliang Tang, Kaihang Pan, Yaobo Ye, Haobo Yuan, Tao Zhang, Tianjie Ju, Zixiang Meng, Shilin Xu, Liyu Jia, Wentao Hu, Meng Luo, Jiebo Luo, Tat-Seng Chua, Shuicheng Yan, Hanwang Zhang,
Abstract要約: 本稿では,MLLMの性能と汎用性を5段階に定義した評価フレームワークであるGeneral-Levelを紹介する。フレームワークの中核はSynergyの概念であり、モデルが理解と生成をまたいだ一貫性のある機能を維持するかどうかを測定する。既存の100以上のMLLMを含む評価結果は、ジェネラリストの能力ランキングを明らかにする。
参考スコア（独自算出の注目度）: 153.9720740167528
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The Multimodal Large Language Model (MLLM) is currently experiencing rapid growth, driven by the advanced capabilities of LLMs. Unlike earlier specialists, existing MLLMs are evolving towards a Multimodal Generalist paradigm. Initially limited to understanding multiple modalities, these models have advanced to not only comprehend but also generate across modalities. Their capabilities have expanded from coarse-grained to fine-grained multimodal understanding and from supporting limited modalities to arbitrary ones. While many benchmarks exist to assess MLLMs, a critical question arises: Can we simply assume that higher performance across tasks indicates a stronger MLLM capability, bringing us closer to human-level AI? We argue that the answer is not as straightforward as it seems. This project introduces General-Level, an evaluation framework that defines 5-scale levels of MLLM performance and generality, offering a methodology to compare MLLMs and gauge the progress of existing systems towards more robust multimodal generalists and, ultimately, towards AGI. At the core of the framework is the concept of Synergy, which measures whether models maintain consistent capabilities across comprehension and generation, and across multiple modalities. To support this evaluation, we present General-Bench, which encompasses a broader spectrum of skills, modalities, formats, and capabilities, including over 700 tasks and 325,800 instances. The evaluation results that involve over 100 existing state-of-the-art MLLMs uncover the capability rankings of generalists, highlighting the challenges in reaching genuine AI. We expect this project to pave the way for future research on next-generation multimodal foundation models, providing a robust infrastructure to accelerate the realization of AGI. Project page: https://generalist.top/
Abstract（参考訳）: MLLM(Multimodal Large Language Model)は現在、LLMの高度な機能によって急速に成長している。以前の専門家とは異なり、既存のMLLMはマルチモーダル・ジェネリストパラダイムへと進化している。当初、複数のモダリティを理解することに限定されていたが、これらのモデルは理解されるだけでなく、複数のモダリティを生成するようになった。それらの能力は、粗粒度から細粒度へのマルチモーダル理解や、限られたモダリティをサポートするものから任意のものへと拡張された。 MLLMを評価するためのベンチマークが数多く存在するが、重要な疑問が浮かび上がっている。タスク間のより高いパフォーマンスは、MLLMのより強力な能力を示し、人間レベルのAIに近づきつつあると仮定できるだろうか? 私たちはその答えは見かけほど単純ではないと論じる。本プロジェクトでは,MLLMの性能と汎用性を5段階に分けた評価フレームワークであるGeneral-Levelを紹介し,MLLMの比較と既存システムの進歩を,より堅牢なマルチモーダル・ジェネラリストへ,そして最終的にはAGIへ向けて評価する方法論を提供する。このフレームワークの中核はSynergyの概念であり、モデルが理解と生成、そして複数のモダリティにわたって一貫性のある機能を維持するかどうかを測定する。この評価を支援するために、700以上のタスクと325,800インスタンスを含む幅広いスキル、モダリティ、フォーマット、能力を含むGeneral-Benchを提案する。 100以上の最先端MLLMを含む評価結果は、ジェネラリストの能力ランキングを明らかにし、真のAIに到達する上での課題を浮き彫りにした。我々は、このプロジェクトが次世代マルチモーダル基盤モデルの研究の道を開くことを期待し、AGIの実現を加速するための堅牢なインフラを提供する。プロジェクトページ: https:// generalist.top/

論文の概要: On Path to Multimodal Generalist: General-Level and General-Bench

関連論文リスト