Fugu-MT 論文翻訳(概要): EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations

論文の概要: EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations

arxiv url: http://arxiv.org/abs/2604.18320v1
Date: Mon, 20 Apr 2026 14:20:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.932924
Title: EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations
Title（参考訳）: EVE:実行可能ビジュアルトランスフォーメーションによるMLLMの自己進化の検証
Authors: Yongrui Heng, Chaoya Jiang, Han Yang, Shikun Zhang, Wei Ye,
Abstract要約: EVE(Executable Visual Transformation-based self-Evolution)は,実行可能なビジュアルトランスフォーメーションを活用することで,擬似ラベルを完全に回避する新しいフレームワークである。 EVEは既存の自己進化手法を一貫して超越し、MLLMの自己進化を検証するための堅牢でスケーラブルなパラダイムを確立している。
参考スコア（独自算出の注目度）: 34.761579091691125
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-evolution of multimodal large language models (MLLMs) remains a critical challenge: pseudo-label-based methods suffer from progressive quality degradation as model predictions drift, while template-based methods are confined to a static set of transformations that cannot adapt in difficulty or diversity. We contend that robust, continuous self-improvement requires not only deterministic external feedback independent of the model's internal certainty, but also a mechanism to perpetually diversify the training distribution. To this end, we introduce EVE (Executable Visual transformation-based self-Evolution), a novel framework that entirely bypasses pseudo-labels by harnessing executable visual transformations continuously enriched in both variety and complexity. EVE adopts a Challenger-Solver dual-policy architecture. The Challenger maintains and progressively expands a queue of visual transformation code examples, from which it synthesizes novel Python scripts to perform dynamic visual transformations. Executing these scripts yields VQA problems with absolute, execution-verified ground-truth answers, eliminating any reliance on model-generated supervision. A multi-dimensional reward system integrating semantic diversity and dynamic difficulty calibration steers the Challenger to enrich its code example queue while posing progressively more challenging tasks, preventing mode collapse and fostering reciprocal co-evolution between the two policies. Extensive experiments demonstrate that EVE consistently surpasses existing self-evolution methods, establishing a robust and scalable paradigm for verifiable MLLM self-evolution. The code is available at https://github.com/0001Henry/EVE .
Abstract（参考訳）: 擬似ラベルベースの手法は、モデル予測がドリフトするにつれて、段階的な品質劣化に悩まされる一方、テンプレートベースの手法は、困難や多様性に適応できない静的な変換セットに限られる。我々は、頑健で継続的な自己改善には、モデルの内部的確実性に依存しない決定論的外部フィードバックだけでなく、トレーニング分布を永久に多様化させるメカニズムも必要であると主張している。この目的のためにEVE(Executable Visual Transformation-based Self-Evolution)という,多様かつ複雑に連続的に濃縮された実行可能な視覚変換を活用することによって,擬似ラベルを完全に回避する新しいフレームワークを紹介した。 EVEはチャレンジャー・ソルバーの二重政治アーキテクチャを採用している。 Challengerはビジュアルトランスフォーメーションコードのキューを維持し、徐々に拡張し、新しいPythonスクリプトを合成して動的ビジュアルトランスフォーメーションを実行する。これらのスクリプトを実行すると、VQAの絶対的で、実行検証された基盤真実の回答が得られ、モデル生成の監視への依存がなくなる。セマンティック多様性と動的難易度キャリブレーションを統合した多次元報酬システムにより、チャレンジャーはコードのサンプルキューを強化し、徐々に困難なタスクをこなし、モード崩壊を防止し、2つのポリシー間の相互共進化を促進する。大規模な実験により、EVEは既存の自己進化手法を一貫して超越し、MLLM自己進化を検証するための堅牢でスケーラブルなパラダイムを確立した。コードはhttps://github.com/0001Henry/EVE で公開されている。

論文の概要: EVE: Verifiable Self-Evolution of MLLMs via Executable Visual Transformations

関連論文リスト