Fugu-MT 論文翻訳(概要): Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering

論文の概要: Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering

arxiv url: http://arxiv.org/abs/2511.01213v1
Date: Mon, 03 Nov 2025 04:13:24 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-05 16:37:27.116695
Title: Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering
Title（参考訳）: 食べ物に対する思考: チェーンが引き起こす食生活の視覚的質問に対する回答
Authors: Riddhi Jain, Manasi Patwardhan, Parijat Deshpande, Venkataramana Runkana,
Abstract要約: 食品VQAは、正確な答えに到達するためには、多段階の推論プロセスに従う必要がある。人間の介入を最小限に抑えてQAに推論チェーンを作ります。ベースラインでは平均10ポイントの精度向上が見られた。
参考スコア（独自算出の注目度）: 5.290249856411331
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The immense diversity in the culture and culinary of Indian cuisines calls attention to the major shortcoming of the existing Visual Question Answering(VQA) systems which are inclined towards the foods from Western region. Recent attempt towards building a VQA dataset for Indian food is a step towards addressing this challenge. However, their approach towards VQA follows a two-step process in which the answer is generated first, followed by the explanation of the expected answer. In this work, we claim that food VQA requires to follow a multi-step reasoning process to arrive at an accurate answer, especially in the context of India food, which involves understanding complex culinary context and identifying relationships between various food items. With this hypothesis we create reasoning chains upon the QA with minimal human intervention. We fine-tune smaller LLMs and VLMs with auto-validated reasoning chains and further train them using reinforcement learning with larger data. With augmentation of reasoning chains, we observed accuracy improvement of an average 10 percentage points on the baseline. We provide detailed analysis in terms the effect of addition of reasoning chains for the Indian Food VQA task. Index Terms - FoodVQA, Reasoning Chains, Reinforcement Learning, Knowledge Graph.
Abstract（参考訳）: インド料理の文化と料理の多様性は、西欧の食品に傾倒している既存の視覚質問回答システム(VQA)の大きな欠点に注意を向けている。インド食品のVQAデータセット構築に向けた最近の試みは、この課題に対処するためのステップである。しかしながら、VQAに対する彼らのアプローチは、2段階のプロセスに従って答えが最初に生成される。本研究では, 食品VQAは, 特にインド料理の文脈において, 複雑な料理の文脈を理解し, さまざまな食品間の関係を識別する多段階の推論プロセスに従う必要があると主張している。この仮説では、最小限の人間の介入でQAに推論連鎖を生成する。我々は,より小型のLDMとVLMを自動バリデード推論チェーンで微調整し,より大規模なデータを用いた強化学習を用いてそれらを訓練する。推論連鎖の増大により,ベースライン上での平均10ポイントの精度向上が観察された。インド食品VQA課題における推論連鎖の追加効果について詳細な分析を行った。インデックス用語 - FoodVQA, Reasoning Chains, Reinforcement Learning, Knowledge Graph。

論文の概要: Thought-For-Food: Reasoning Chain Induced Food Visual Question Answering

関連論文リスト