Fugu-MT 論文翻訳(概要): EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026

論文の概要: EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026

arxiv url: http://arxiv.org/abs/2605.24496v1
Date: Sat, 23 May 2026 10:05:56 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.134294
Title: EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026
Title（参考訳）: EgoAction: Egocentric Action composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026
Authors: Zhiheng Fu, Zixu Li, Zhiwei Chen, Fangxu Liu, Yupeng Hu, Weili Guan, Liqiang Nie,
Abstract要約: EgoActionは、統合された分離された検出と融合パイプラインである。パイプラインはEPICに精細化されたVideoMAE-L機能を使用し、因果時間モデルを用いて名詞と動詞の時間的検出器を分離する。 EgoActionは、エゴセントリックな時間的行動検出のためのコンパクトで再現可能なシステムを提供する。
参考スコア（独自算出の注目度）: 69.56534058291463
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The EPIC-KITCHENS-100 Action Detection challenge evaluates whether a model can localize the start and end of each action in long untrimmed egocentric videos and assign the corresponding verb--noun action label. In this report, we formulate our submission as EgoAction (Egocentric Action Composition with Reliability-Aware Temporal Fusion), a unified decoupled detection and fusion pipeline. The pipeline uses EPIC-finetuned VideoMAE-L features, trains separate noun and verb temporal detectors with causal temporal modeling, composes action hypotheses from top noun--verb pairs, and introduces a confidence-adaptive boundary fusion rule at post-processing time. The key observation is that verb and noun streams often fail differently: verb scores are sensitive to motion transitions, whereas noun scores are sensitive to hand-object visibility and object clutter. A fixed arithmetic mean of their predicted boundaries can therefore amplify localization errors when one stream degenerates. We replace this hard-coded mean with Dynamic Weighted Fusion (DWF), which normalizes the maximum noun and verb classification confidences into proposal-wise boundary weights and linearly combines the two intervals. This lightweight tensor-only operator shifts boundary authority toward the more reliable stream while preserving the decoupled action scoring mechanism. Together with sliding-window inference, top-K noun--verb action composition, and class-wise Soft-NMS, EgoAction provides a compact and reproducible system for egocentric temporal action detection.
Abstract（参考訳）: EPIC-KITCHENS-100 Action Detection Challenge(EPIC-KITCHENS-100 Action Detection)は、長編のエゴセントリックビデオにおいて、モデルが各アクションの開始と終了をローカライズできるかどうかを評価し、対応する動詞-名詞アクションラベルを割り当てる。本稿では,EgoAction(Egocentric Action composition with Reliability-Aware Temporal Fusion, Egocentric Action composition with Reliability-Aware Temporal Fusion, Egocentric Action composition with Reliability-Aware Temporal Fusion)として提案する。このパイプラインはEPICで微調整されたVideoMAE-L機能を使用し、因果時間モデルで名詞と動詞の時間的検出を分離し、上位の名詞-動詞対からアクション仮説を作成し、後処理時に信頼適応境界融合ルールを導入する。動詞のスコアは動きの遷移に敏感であり、名詞のスコアは手動の可視性とオブジェクトの乱れに敏感である。したがって、予測境界の固定算術平均は、あるストリームが縮退したときの局所化誤差を増幅することができる。我々は、このハードコードされた平均値を動的重み付き融合(DWF)に置き換え、最大名詞と動詞の分類の信頼度をプロポーザルワイド境界重みに正規化し、2つの区間を線形に結合する。この軽量なテンソルのみ演算子は、分離されたアクションスコアリング機構を保持しながら、バウンダリ権限をより信頼性の高いストリームにシフトする。 EgoActionは、スライドウインドウ推論、トップK名詞動詞アクション合成、およびクラスワイドなソフトNMSとともに、エゴセントリックな時間的行動検出のためのコンパクトで再現可能なシステムを提供する。

論文の概要: EgoAction: Egocentric Action Composition with Reliability-Aware Temporal Fusion for the EPIC-KITCHENS Action Detection Challenge at CVPR 2026

関連論文リスト