Fugu-MT 論文翻訳(概要): Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation

論文の概要: Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation

arxiv url: http://arxiv.org/abs/2507.04680v1
Date: Mon, 07 Jul 2025 05:56:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-07-08 15:46:35.287637
Title: Identify, Isolate, and Purge: Mitigating Hallucinations in LVLMs via Self-Evolving Distillation
Title（参考訳）: 自己進化蒸留によるLVLMの幻覚の同定, 分離, 精製
Authors: Wenhao Li, Xiu Su, Jingyi Wu, Feng Yang, Yang Liu, Yi Chen, Shan You, Chang Xu,
Abstract要約: 幻覚の問題は信頼性と応用可能性を大幅に制限します既存の緩和方法は、外部ツールやマルチラウンド推論の比較に依存している。我々は, LVLMの内部知識における幻覚を識別し, 分離し, 浄化するtextbfSElf-textbfDistillation (textbfSEED)を提案する。
参考スコア（独自算出の注目度）: 52.52962914918779
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable advancements in numerous areas such as multimedia. However, hallucination issues significantly limit their credibility and application potential. Existing mitigation methods typically rely on external tools or the comparison of multi-round inference, which significantly increase inference time. In this paper, we propose \textbf{SE}lf-\textbf{E}volving \textbf{D}istillation (\textbf{SEED}), which identifies hallucinations within the inner knowledge of LVLMs, isolates and purges them, and then distills the purified knowledge back into the model, enabling self-evolution. Furthermore, we identified that traditional distillation methods are prone to inducing void spaces in the output space of LVLMs. To address this issue, we propose a Mode-Seeking Evolving approach, which performs distillation to capture the dominant modes of the purified knowledge distribution, thereby avoiding the chaotic results that could emerge from void spaces. Moreover, we introduce a Hallucination Elimination Adapter, which corrects the dark knowledge of the original model by learning purified knowledge. Extensive experiments on multiple benchmarks validate the superiority of our SEED, demonstrating substantial improvements in mitigating hallucinations for representative LVLM models such as LLaVA-1.5 and InternVL2. Remarkably, the F1 score of LLaVA-1.5 on the hallucination evaluation metric POPE-Random improved from 81.3 to 88.3.
Abstract（参考訳）: LVLM(Large Vision-Language Models)は、マルチメディアなど多くの分野で顕著な進歩を見せている。しかし幻覚の問題は、その信頼性と応用可能性を大幅に制限している。既存の緩和手法は、通常、外部ツールやマルチラウンド推論の比較に依存し、推論時間を大幅に増加させる。本稿では, LVLMの内部知識における幻覚を識別し, 分離し, パージし, 精製した知識をモデルに蒸留し, 自己進化を可能にする。さらに, 従来の蒸留法は, LVLMの出力空間に空隙を誘導する傾向があることを確認した。この問題に対処するため, 精製された知識分布の支配的なモードを蒸留し, 空白空間から生じる可能性のあるカオス的な結果を避けるためのモード探索進化手法を提案する。さらに,浄化された知識を学習することで,原モデルの暗黒知識を補正する幻覚除去適応器を導入する。 LLaVA-1.5 や InternVL2 などの代表的な LVLM モデルに対する幻覚の緩和効果が著しく向上した。幻覚評価指標POPE-RandomにおけるLLaVA-1.5のF1スコアは81.3から88.3に改善した。

関連論文リスト

Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering [83.63437999696954]
大規模言語モデル(MLLM)における幻覚は、ビデオ領域において重要かつ未適応な課題として持続する。本稿では,幻覚に敏感なモジュールを適応的に識別し,操作するビデオLLMのための時間認識型アクティベーションエンジニアリングフレームワークを提案する。
論文参考訳（メタデータ） (2025-05-19T08:12:06Z)
Efficient Contrastive Decoding with Probabilistic Hallucination Detection - Mitigating Hallucinations in Large Vision Language Models - [1.2499537119440245]
効率的なコントラストデコーディング(ECD)は、確率的幻覚検出を利用して、推定時に出力分布を文脈的に正確な解へとシフトする単純な方法である。実験の結果,LCDは幻覚を効果的に軽減し,LVLMベンチマークの性能や計算時間に対して最先端の手法より優れることがわかった。
論文参考訳（メタデータ） (2025-04-16T14:50:25Z)
A Survey of Hallucination in Large Visual Language Models [48.794850395309076]
幻覚の存在は、様々な分野におけるLVLMの可能性と実用性を制限している。 LVLMの構造と幻覚の発生の主な原因を紹介する。 LVLMの幻覚評価ベンチマークについて述べる。
論文参考訳（メタデータ） (2024-10-20T10:58:58Z)
Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning [16.883679810267342]
幻覚に対処するための反復モデルレベルのコントラスト学習(Iter-AHMCL) 本稿では,幻覚に対処するイテレーティブモデルレベルのコントラスト学習(Iter-AHMCL)を提案する。
論文参考訳（メタデータ） (2024-10-16T00:15:40Z)
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation [21.31915988262898]
本稿では,幻覚に対処するための新しいフレームワークであるActive Retrieval-Augmented Large Vision-Language Model(ARA)を紹介する。実験により, 適応した検索機構とタイミングを加味することにより, 幻覚の問題を効果的に緩和できることが示唆された。
論文参考訳（メタデータ） (2024-08-01T13:38:58Z)
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs [52.497823009176074]
LVLM(Large Vision-Language Models)はしばしば、幻覚として知られる事実情報を誤認する応答を生成する。視覚的知覚の向上とLVLMの推論能力の向上を目的とした学習自由度手法であるVisual Description Grounded Decoding (VDGD)を紹介した。
論文参考訳（メタデータ） (2024-05-24T16:21:59Z)
Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding [25.489832294197797]
本稿では,LVLM推論における幻覚の低減を目的とした,命令コントラストデコーディング(ICD)手法を提案する。本手法は,マルチモーダル核融合モジュールにおいて,外乱指示が幻覚を著しく悪化させるという観察に着想を得たものである。
論文参考訳（メタデータ） (2024-03-27T16:04:47Z)
Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance [51.30560006045442]
Image-gRounded guIdaNcE (MARINE)は、トレーニングフリーかつAPIフリーのフレームワークである。 MARINEは、LVLMに画像グラウンドガイダンスを導入することにより、推論中の物体の幻覚を効果的かつ効率的に低減する。私たちのフレームワークの柔軟性は、さらに複数のビジョンモデルの統合を可能にし、より信頼性が高く堅牢なオブジェクトレベルのガイダンスを可能にします。
論文参考訳（メタデータ） (2024-02-13T18:59:05Z)
Alleviating Hallucinations of Large Language Models through Induced Hallucinations [67.35512483340837]
大規模言語モデル(LLM)は、不正確な情報や製造された情報を含む応答を生成するために観察されている。幻覚を緩和するための単純なtextitInduce-then-Contrast Decoding (ICD) 戦略を提案する。
論文参考訳（メタデータ） (2023-12-25T12:32:49Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
大きな言語モデル(LLM)は幻覚を発生させ、ミッションクリティカルなタスクにデプロイすると大きなダメージを与える可能性がある。本稿では,逆検証に基づく自己チェック手法を提案し,ゼロリソース方式で事実誤りを自動的に検出する。提案手法と既存のゼロリソース検出手法を2つのデータセット上で実証的に評価した。
論文参考訳（メタデータ） (2023-10-10T10:14:59Z)
The Troubling Emergence of Hallucination in Large Language Models -- An Extensive Definition, Quantification, and Prescriptive Remediations [10.20632187568563]
我々は、その程度、向き、カテゴリーに基づいて、プロファイリング幻覚について論じる。幻覚は, (i) acronym ambiguity, (ii)numeric nuisance, (iii) generated golem, (iv) virtual voice, (v) Geographic erratum, (vi) time wrapの6種類に分類される。より広範なNLPコミュニティのためのツールとしてHalucination Vulnerability Index (HVI)を提案する。
論文参考訳（メタデータ） (2023-10-08T03:31:29Z)
AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
本稿では,AutoHallと呼ばれる既存のファクトチェックデータセットに基づいて,モデル固有の幻覚データセットを自動的に構築する手法を提案する。また,自己コントラディションに基づくゼロリソース・ブラックボックス幻覚検出手法を提案する。
論文参考訳（メタデータ） (2023-09-30T05:20:02Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。