Fugu-MT 論文翻訳(概要): EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications

論文の概要: EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications

arxiv url: http://arxiv.org/abs/2505.17654v2
Date: Mon, 09 Jun 2025 12:54:55 GMT
ステータス: 翻訳完了
システム内更新日: 2025-06-10 16:33:10.122817
Title: EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications
Title（参考訳）: EVADE:Eコマースアプリケーションにおける広範コンテンツ検出のためのマルチモーダルベンチマーク
Authors: Ancheng Xu, Zhihao Yang, Jingpeng Li, Guanghu Yuan, Longze Chen, Liang Yan, Jiehui Zhou, Zhen Qin, Hengyun Chang, Hamid Alinejad-Rokny, Bo Zheng, Min Yang,
Abstract要約: EVADEは、eコマースにおける回避コンテンツ検出の基礎モデルを評価するために設計された、最初の専門家による、中国のマルチモーダルベンチマークである。データセットには、2,833の注釈付きテキストサンプルと、6つの要求のある製品カテゴリにまたがる13,961のイメージが含まれている。
参考スコア（独自算出の注目度）: 24.832537917472894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: E-commerce platforms increasingly rely on Large Language Models (LLMs) and Vision-Language Models (VLMs) to detect illicit or misleading product content. However, these models remain vulnerable to evasive content: inputs (text or images) that superficially comply with platform policies while covertly conveying prohibited claims. Unlike traditional adversarial attacks that induce overt failures, evasive content exploits ambiguity and context, making it far harder to detect. Existing robustness benchmarks provide little guidance for this demanding, real-world challenge. We introduce EVADE, the first expert-curated, Chinese, multimodal benchmark specifically designed to evaluate foundation models on evasive content detection in e-commerce. The dataset contains 2,833 annotated text samples and 13,961 images spanning six demanding product categories, including body shaping, height growth, and health supplements. Two complementary tasks assess distinct capabilities: Single-Violation, which probes fine-grained reasoning under short prompts, and All-in-One, which tests long-context reasoning by merging overlapping policy rules into unified instructions. Notably, the All-in-One setting significantly narrows the performance gap between partial and full-match accuracy, suggesting that clearer rule definitions improve alignment between human and model judgment. We benchmark 26 mainstream LLMs and VLMs and observe substantial performance gaps: even state-of-the-art models frequently misclassify evasive samples. By releasing EVADE and strong baselines, we provide the first rigorous standard for evaluating evasive-content detection, expose fundamental limitations in current multimodal reasoning, and lay the groundwork for safer and more transparent content moderation systems in e-commerce. The dataset is publicly available at https://huggingface.co/datasets/koenshen/EVADE-Bench.
Abstract（参考訳）: Eコマースプラットフォームは、不正または誤解を招く製品コンテンツを検出するために、Large Language Models(LLM)とVision-Language Models(VLM)にますます依存している。しかし、これらのモデルは、禁止されたクレームを隠蔽しながら、プラットフォームポリシーを表面的に遵守する入力(テキストまたは画像)という、回避的コンテンツに対して脆弱なままである。過度な失敗を引き起こす従来の敵攻撃とは異なり、回避的コンテンツは曖昧さとコンテキストを悪用し、検出がはるかに困難になる。既存の堅牢性ベンチマークは、この需要の高い現実世界の課題に対するガイダンスをほとんど提供しない。 EVADEは,電子商取引における回避コンテンツ検出の基礎モデルを評価するために設計された,最初の専門家による,中国のマルチモーダルベンチマークである。データセットには、2,833個の注釈付きテキストサンプルと13,961枚の画像が含まれており、ボディシェーピング、ハイト成長、健康サプリメントを含む6つの需要ある製品カテゴリにまたがっている。 2つの補完的なタスクは、短いプロンプトの下できめ細かい推論を探索するSingle-Violationと、重複するポリシールールを統一的な命令にマージすることによって、長期コンテキスト推論をテストするAll-in-Oneである。特にオール・イン・ワン・セッティングは部分的な精度と完全マッチングの精度の間の性能ギャップを著しく狭め、より明確なルール定義が人間とモデルの判断の整合性を改善することを示唆している。我々は26の主要なLCMとVLMをベンチマークし、かなりの性能差を観測する:最先端のモデルでさえ、しばしば回避サンプルを誤分類する。 EVADEと強力なベースラインをリリースすることにより、回避コンテンツ検出の厳格な評価、現在のマルチモーダル推論の基本的制約の顕在化、電子商取引におけるより安全で透明性の高いコンテンツモデレーションシステムの基礎となるものを提供する。データセットはhttps://huggingface.co/datasets/koenshen/EVADE-Bench.comで公開されている。

論文の概要: EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications

関連論文リスト