Fugu-MT 論文翻訳(概要): AdsQA: Towards Advertisement Video Understanding

論文の概要: AdsQA: Towards Advertisement Video Understanding

arxiv url: http://arxiv.org/abs/2509.08621v1
Date: Wed, 10 Sep 2025 14:17:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-11 15:16:52.450848
Title: AdsQA: Towards Advertisement Video Understanding
Title（参考訳）: AdsQA:ビデオ理解を目指して
Authors: Xinwei Long, Kai Tian, Peng Xu, Guoli Jia, Jingxuan Li, Sa Yang, Yihua Shao, Kaiyan Zhang, Che Jiang, Hao Xu, Yang Liu, Jiaheng Ma, Bowen Zhou,
Abstract要約: 我々は,大規模言語モデル(LLM)の能力を調べるために,広告(ad)ビデオを挑戦的なテストベッドとして使用することを提案する。私たちのモチベーションは、例えばマーケティングロジック、説得戦略、オーディエンスエンゲージメントといった、ヒントに富んだ情報に富んだ広告ビデオの特徴を最大限に活用することにあります。広告QAベンチマークであるAdsQAは、広告ビデオ1,544本、クリップ10,962本、合計22.7時間、課題5つのタスクを提供する。
参考スコア（独自算出の注目度）: 27.89010198926177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have taken a great step towards AGI. Meanwhile, an increasing number of domain-specific problems such as math and programming boost these general-purpose models to continuously evolve via learning deeper expertise. Now is thus the time further to extend the diversity of specialized applications for knowledgeable LLMs, though collecting high quality data with unexpected and informative tasks is challenging. In this paper, we propose to use advertisement (ad) videos as a challenging test-bed to probe the ability of LLMs in perceiving beyond the objective physical content of common visual domain. Our motivation is to take full advantage of the clue-rich and information-dense ad videos' traits, e.g., marketing logic, persuasive strategies, and audience engagement. Our contribution is three-fold: (1) To our knowledge, this is the first attempt to use ad videos with well-designed tasks to evaluate LLMs. We contribute AdsQA, a challenging ad Video QA benchmark derived from 1,544 ad videos with 10,962 clips, totaling 22.7 hours, providing 5 challenging tasks. (2) We propose ReAd-R, a Deepseek-R1 styled RL model that reflects on questions, and generates answers via reward-driven optimization. (3) We benchmark 14 top-tier LLMs on AdsQA, and our \texttt{ReAd-R}~achieves the state-of-the-art outperforming strong competitors equipped with long-chain reasoning capabilities by a clear margin.
Abstract（参考訳）: 大規模言語モデル(LLM)はAGIに向けて大きな一歩を踏み出した。一方、数学やプログラミングといったドメイン固有の問題の増加は、これらの汎用モデルを深い専門知識を学習することで継続的に進化させる。そのため、予期せぬ情報的タスクによる高品質なデータ収集は難しいが、知識のあるLLMのための専門的な応用の多様化をさらに進める時が来た。本稿では、広告(ad)動画を挑戦的なテストベッドとして使用し、共通視覚領域の客観的な物理的内容以上の知覚におけるLCMの能力を調べることを提案する。私たちのモチベーションは、手がかりに富んだ情報に富んだ広告ビデオの特徴、例えば、マーケティングロジック、説得戦略、オーディエンスエンゲージメントを最大限に活用することにあります。 1) LLMを評価するために、よく設計されたタスクで広告ビデオを使用する試みとしては、これが初めてです。広告QAベンチマークであるAdsQAは、広告ビデオ1,544本、クリップ10,962本、合計22.7時間、課題5つのタスクを提供する。 2)Deepseek-R1スタイルのRLモデルであるReAd-Rを提案する。 (3)AdsQA上で14の上位LCMをベンチマークし、最先端の長鎖推論機能を備えた強力な競争相手を明確なマージンで達成する。

論文の概要: AdsQA: Towards Advertisement Video Understanding

関連論文リスト