Fugu-MT 論文翻訳(概要): FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition

論文の概要: FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition

arxiv url: http://arxiv.org/abs/2605.13193v2
Date: Tue, 19 May 2026 13:19:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:08.272612
Title: FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition
Title（参考訳）: FIKA-Bench:微粒化認識から微粒化知識獲得へ
Authors: Geng Li, Yuxin Peng,
Abstract要約: 日常生活におけるきめ細かい認識は、しばしばクローズドブックの分類問題ではない。既存のベンチマークは主に視覚的認識を評価しており、このアクティブな外部知識獲得能力は過小評価されている。そこでは,システムが外部の証拠を探し,検証し,利用し,オープンエンドのきめ細かい認識質問に答えなければならない,きめ細かな知識獲得について検討する。
参考スコア（独自算出の注目度）: 54.31138496553705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Fine-grained recognition in everyday life is often not a closed-book classification problem: when encountering unfamiliar objects, humans actively search, compare visual details, and verify evidence before deciding. Existing benchmarks primarily evaluate visually recognition, leaving this active external knowledge acquisition ability underexplored. We study fine-grained knowledge acquisition, where a system must seek, verify, and use external evidence to answer open-ended fine-grained recognition questions. We introduce FIKA-Bench, a leakage-aware and evidence-grounded collection of 311 public-source and real-life instances. To ensure high quality, every example is filtered against frontier closed-book models to remove memorized cases and audited to eliminate image-answer leakage, retaining only samples supported by verified evidence. Our evaluation of latest Large Multimodal Models (LMMs) and agents reveals that the task remains a formidable challenge: the best system reaches only 25.1% accuracy, with no model exceeding 30%. Crucially, we find that merely equipping models with tools is insufficient to bridge this gap; agent failures are predominantly driven by wrong entity retrieval and poor visual judgement. These results show that reliable knowledge acquisition needs better agent designs that focus on fine-grained recognition.
Abstract（参考訳）: 身近な物体に遭遇すると、人間が積極的に検索し、視覚的詳細を比較し、決定する前に証拠を検証します。既存のベンチマークは主に視覚的認識を評価しており、このアクティブな外部知識獲得能力は過小評価されている。そこでは,システムが外部の証拠を探し,検証し,利用して,オープンエンドのきめ細かい認識質問に答えなければならない,きめ細かな知識獲得について検討する。 FIKA-Benchは,311のオープンソースおよび実環境インスタンスのリーク認識とエビデンスを基盤としたコレクションである。高品質を確保するため、すべてのサンプルはフロンティアのクローズドブックモデルに対してフィルタリングされ、暗記されたケースを除去し、検査して画像検索のリークを除去し、証拠によって支持されるサンプルのみを保持する。最新のLMM(Large Multimodal Models)とエージェントによる評価では、このタスクは依然として深刻な課題であり、最高のシステムはわずか25.1%の精度で、30%を超えるモデルはない。重要なことは、単にツールを装備するだけでこのギャップを埋めることはできない。エージェントの失敗は、間違ったエンティティの検索と視覚的判断の欠如によって主に引き起こされる。これらの結果から,信頼性の高い知識獲得には,きめ細かな認識に焦点を当てたエージェント設計が必要であることが示唆された。

論文の概要: FIKA-Bench: From Fine-grained Recognition to Fine-Grained Knowledge Acquisition

関連論文リスト