Fugu-MT 論文翻訳(概要): AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks

論文の概要: AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks

arxiv url: http://arxiv.org/abs/2509.16438v1
Date: Fri, 19 Sep 2025 21:35:04 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-23 18:58:15.790166
Title: AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks
Title（参考訳）: AutoArabic: ビデオテキスト検索ベンチマークをローカライズするための3段階フレームワーク
Authors: Mohamed Eltahir, Osamah Sarraj, Abdulrahman Alfrihidi, Taha Alshatiri, Mohammed Khurd, Mohammed Bremoo, Tanveer Hussain,
Abstract要約: 我々は、アラビア語以外のベンチマークを現代標準アラビア語に翻訳する3段階のフレームワーク、オートアラビアを導入する。このフレームワークには、潜在的な翻訳エラーを自動的に97%の精度でフラグするエラー検出モジュールが含まれている。このフレームワークをDiDeMoに適用したビデオ検索ベンチマークでは、40,144のアラビア語記述を持つアラビア語の変種であるDiDeMo-ARを生成する。
参考スコア（独自算出の注目度）: 3.065560256430169
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video-to-text and text-to-video retrieval are dominated by English benchmarks (e.g. DiDeMo, MSR-VTT) and recent multilingual corpora (e.g. RUDDER), yet Arabic remains underserved, lacking localized evaluation metrics. We introduce a three-stage framework, AutoArabic, utilizing state-of-the-art large language models (LLMs) to translate non-Arabic benchmarks into Modern Standard Arabic, reducing the manual revision required by nearly fourfold. The framework incorporates an error detection module that automatically flags potential translation errors with 97% accuracy. Applying the framework to DiDeMo, a video retrieval benchmark produces DiDeMo-AR, an Arabic variant with 40,144 fluent Arabic descriptions. An analysis of the translation errors is provided and organized into an insightful taxonomy to guide future Arabic localization efforts. We train a CLIP-style baseline with identical hyperparameters on the Arabic and English variants of the benchmark, finding a moderate performance gap (about 3 percentage points at Recall@1), indicating that Arabic localization preserves benchmark difficulty. We evaluate three post-editing budgets (zero/ flagged-only/ full) and find that performance improves monotonically with more post-editing, while the raw LLM output (zero-budget) remains usable. To ensure reproducibility to other languages, we made the code available at https://github.com/Tahaalshatiri/AutoArabic.
Abstract（参考訳）: ビデオ・トゥ・テキスト・トゥ・ビデオ検索は、英語のベンチマーク(例: DiDeMo, MSR-VTT)と最近の多言語コーパス(例: RUDDER)によって支配されているが、アラビア語は保存されていない。我々は、現在最先端の大規模言語モデル(LLM)を活用して、非アラビアベンチマークをモダン標準アラビア語に変換する3段階フレームワークであるAutoArabicを導入し、ほぼ4倍のマニュアルリビジョンを削減した。このフレームワークには、潜在的な翻訳エラーを自動的に97%の精度でフラグするエラー検出モジュールが含まれている。このフレームワークをDiDeMoに適用したビデオ検索ベンチマークでは、40,144のアラビア語記述を持つアラビア語の変種であるDiDeMo-ARを生成する。翻訳エラーの分析が提供され、将来のアラビア語のローカライゼーションの取り組みを導くための洞察に富んだ分類に組織化される。ベンチマークのアラビア語と英語の変種で同一のハイパーパラメータを持つCLIPスタイルのベースラインをトレーニングし、中程度のパフォーマンスギャップ(Recall@1で約3ポイント)を見つけ、アラビアのローカライゼーションがベンチマークの困難を保っていることを示す。我々は,3つの後処理予算(ゼロ/フラグ付きのみ/フル)を評価し,生のLLM出力(ゼロ予算)を引き続き使用しながら,より後処理で単調に性能が向上することを確認した。他の言語への再現性を確保するため、私たちはhttps://github.com/Tahaalshatiri/AutoArabic.comでコードを公開しました。

論文の概要: AutoArabic: A Three-Stage Framework for Localizing Video-Text Retrieval Benchmarks

関連論文リスト