Fugu-MT 論文翻訳(概要): Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating

論文の概要: Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating

arxiv url: http://arxiv.org/abs/2604.10078v1
Date: Sat, 11 Apr 2026 07:51:28 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-14 20:13:15.828241
Title: Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating
Title（参考訳）: グループエンゲージメント認識のための注意誘導型デュアルストリーム学習:適応ゲーティングによるシーンコンテキストによる変圧器符号化運動ダイナミクスの融合
Authors: Saniah Kayenat Chowdhury, Muhammad E. H. Chowdhury,
Abstract要約: 学生参加は集団活動における学習成果の向上に不可欠である。ほとんどの自動エンゲージメント認識方法は、オンライン教室や個人レベルでのエンゲージメントを推定するために設計されている。クラス内ビデオからグループレベルのエンゲージメント認識を実現するための新しい2ストリームフレームワークであるDualEngageを提案する。
参考スコア（独自算出の注目度）: 4.108374141003715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Student engagement is crucial for improving learning outcomes in group activities. Highly engaged students perform better both individually and contribute to overall group success. However, most existing automated engagement recognition methods are designed for online classrooms or estimate engagement at the individual level. Addressing this gap, we propose DualEngage, a novel two-stream framework for group-level engagement recognition from in-classroom videos. It models engagement as a joint function of both individual and group-level behaviors. The primary stream models person-level motion dynamics by detecting and tracking students, extracting dense optical flow with the Recurrent All-Pairs Field Transforms network, encoding temporal motion patterns using a transformer encoder, and finally aggregating per-student representations through attention pooling into a unified representation. The secondary stream captures scene-level spatiotemporal information from the full video clip, leveraging a pretrained three-dimensional Residual Network. The two-stream representations are combined via softmax-gated fusion, which dynamically weights each stream's contribution based on the joint context of both features. DualEngage learns a joint representation of individual actions with overarching group dynamics. We evaluate the proposed approach using fivefold cross-validation on the Classroom Group Engagement Dataset developed by Ocean University of China, achieving an average classification accuracy of 0.9621+/-0.0161 with a macro-averaged F1 of 0.9530+/-0.0204. To understand the contribution of each branch, we further conduct an ablation study comparing single-stream variants against the two-stream model. This work is among the first in classroom engagement recognition to adopt a dual-stream design that explicitly leverages motion cues as an estimator.
Abstract（参考訳）: 学生参加は集団活動における学習成果の向上に不可欠である。ハイエンゲージな学生は個々により良い成績を収め、グループ全体の成功に貢献する。しかし、既存の自動エンゲージメント認識手法のほとんどは、オンライン教室や個人レベルでのエンゲージメントを推定するために設計されている。このギャップに対処するため、クラス内ビデオからのグループレベルのエンゲージメント認識のための新しい2ストリームフレームワークであるDualEngageを提案する。これは、個人レベルの行動とグループレベルの行動の両方の結合関数としてエンゲージメントをモデル化する。一次ストリームは、学生を検出・追跡し、リカレントオールペアフィールド変換ネットワークを用いて密集した光の流れを抽出し、トランスフォーマーエンコーダを用いて時間的動きパターンを符号化し、最後に、注意プールを統一された表現にすることで、学生ごとの表現を集約する。二次ストリームは、事前訓練された3次元残留ネットワークを利用して、フルビデオクリップからシーンレベルの時空間情報をキャプチャする。 2ストリームの表現は、両方の特徴の結合コンテキストに基づいて、各ストリームのコントリビューションを動的に重み付けするソフトマックスゲート融合によって結合される。 DualEngage は集団力学による個々の行動の合同表現を学習する。提案手法は,中国オーシャン大学が開発したクラスルーム群エンゲージメントデータセットを用いて,マクロ平均F1の0.9530+/-0.0204の平均分類精度を0.9621+/-0.0161と評価した。さらに, 各枝の寄与を理解するために, 単流モデルと二流モデルとの比較を行った。この研究は、モーションキューを推定器として明示的に活用するデュアルストリームデザインを採用した最初の教室エンゲージメント認識の1つである。

論文の概要: Attention-Guided Dual-Stream Learning for Group Engagement Recognition: Fusing Transformer-Encoded Motion Dynamics with Scene Context via Adaptive Gating

関連論文リスト