Related papers: Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments

Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments

URL: http://arxiv.org/abs/2512.23819v1
Date: Mon, 29 Dec 2025 19:30:41 GMT
Title: Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments
Authors: Surya Rayala, Marcos Quinones-Grueiro, Naveeduddin Mohammed, Ashwin T S, Benjamin Goldberg, Randall Spain, Paige Lawton, Gautam Biswas,
Abstract summary: This paper introduces a video-based assessment pipeline that derives performance analytics from training videos without requiring additional hardware.<n>We develop task-specific metrics that measure psychomotor fluency, situational awareness, and team coordination.<n>Future work includes expanding analysis to 3D video data and leveraging video analysis to enable scalable evaluation within STEs.
Score: 1.6162271703130058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective urban warfare training requires situational awareness and muscle memory, developed through repeated practice in realistic yet controlled environments. A key drill, Enter and Clear the Room (ECR), demands threat assessment, coordination, and securing confined spaces. The military uses Synthetic Training Environments that offer scalable, controlled settings for repeated exercises. However, automatic performance assessment remains challenging, particularly when aiming for objective evaluation of cognitive, psychomotor, and teamwork skills. Traditional methods often rely on costly, intrusive sensors or subjective human observation, limiting scalability and accuracy. This paper introduces a video-based assessment pipeline that derives performance analytics from training videos without requiring additional hardware. By utilizing computer vision models, the system extracts 2D skeletons, gaze vectors, and movement trajectories. From these data, we develop task-specific metrics that measure psychomotor fluency, situational awareness, and team coordination. These metrics feed into an extended Cognitive Task Analysis (CTA) hierarchy, which employs a weighted combination to generate overall performance scores for teamwork and cognition. We demonstrate the approach with a case study of real-world ECR drills, providing actionable, domain specific metrics that capture individual and team performance. We also discuss how these insights can support After Action Reviews with interactive dashboards within Gamemaster and the Generalized Intelligent Framework for Tutoring (GIFT), providing intuitive and understandable feedback. We conclude by addressing limitations, including tracking difficulties, ground-truth validation, and the broader applicability of our approach. Future work includes expanding analysis to 3D video data and leveraging video analysis to enable scalable evaluation within STEs.

Related papers

From Perception to Action: An Interactive Benchmark for Vision Reasoning [51.11355591375073]
Causal Hierarchy of Actions and Interactions (CHAIN) benchmark designed to evaluate whether models can understand, plan, and execute structured action sequences grounded in physical constraints.<n> CHAIN shifts evaluation from passive perception to active problem solving, spanning tasks such as interlocking mechanical puzzles and 3D stacking and packing.<n>Our results show that top-performing models still struggle to internalize physical structure and causal constraints, often failing to produce reliable long-horizon plans and cannot robustly translate perceived structure into effective actions.
arXiv Detail & Related papers (2026-02-24T15:33:02Z)
Watch and Learn: Learning to Use Computers from Online Videos [50.10702690339142]
Watch & Learn (W&L) is a framework that converts human demonstration videos readily available on the Internet into executable UI trajectories at scale.<n>We develop an inverse dynamics labeling pipeline with task-aware video retrieval, generate over 53k high-quality trajectories from raw web videos.<n>These results highlight web-scale human demonstration videos as a practical and scalable foundation for advancing CUAs towards real-world deployment.
arXiv Detail & Related papers (2025-10-06T10:29:00Z)
Trainee Action Recognition through Interaction Analysis in CCATT Mixed-Reality Training [1.5641818606249476]
Critical Care Air Transport Team members must stabilize severely injured soldiers by managing ventilators, IV pumps, and suction devices during flight.<n>Recent advances in simulation and multimodal data analytics enable more objective and comprehensive performance evaluation.<n>This study examines how CCATT members are trained using mixed-reality simulations that replicate the high-pressure conditions of aeromedical evacuation.
arXiv Detail & Related papers (2025-09-22T15:19:45Z)
From Seeing to Experiencing: Scaling Navigation Foundation Models with Reinforcement Learning [59.88543114325153]
We introduce the Seeing-to-Experiencing framework to scale the capability of navigation foundation models with reinforcement learning.<n>S2E combines the strengths of pre-training on videos and post-training through RL.<n>We establish a comprehensive end-to-end evaluation benchmark, NavBench-GS, built on photorealistic 3DGS reconstructions of real-world scenes.
arXiv Detail & Related papers (2025-07-29T17:26:10Z)
From Sight to Insight: Unleashing Eye-Tracking in Weakly Supervised Video Salient Object Detection [60.11169426478452]
This paper aims to introduce fixation information to assist the detection of salient objects under weak supervision.<n>We propose a Position and Semantic Embedding (PSE) module to provide location and semantic guidance during the feature learning process.<n>An Intra-Inter Mixed Contrastive (MCII) model improves thetemporal modeling capabilities under weak supervision.
arXiv Detail & Related papers (2025-06-30T05:01:40Z)
Reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach [23.52028824411467]
We present a large-scale experimental study involving numepisodes navigation episodes in a real environment with a physical robot.<n>We analyze the type of reasoning emerging from end-to-end training.<n>We show in a post-hoc analysis that the value function learned by the agent relates to long-term planning.
arXiv Detail & Related papers (2025-03-11T11:16:47Z)
When Pre-trained Visual Representations Fall Short: Limitations in Visuo-Motor Robot Learning [25.95301873726987]
integration of pre-trained visual representations into visuo-motor robot learning has emerged as a promising alternative to training visual encoders from scratch.<n>PVRs face critical challenges in the context of policy learning, including temporal entanglement and an inability to generalise even in the presence of minor scene perturbations.<n>This work identifies these shortcomings and proposes solutions to address them. First, we augment PVR features with temporal perception and a sense of task completion, effectively disangling them in time.<n>Second, we introduce a module that learns to selectively attend to task-relevant local features, enhancing robustness when evaluated on out-of-distribution
arXiv Detail & Related papers (2025-02-05T15:25:46Z)
Open-World Drone Active Tracking with Goal-Centered Rewards [62.21394499788672]
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations.<n>We propose DAT, the first open-world drone active air-to-ground tracking benchmark.<n>We also propose GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios.
arXiv Detail & Related papers (2024-12-01T09:37:46Z)
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning [8.626019848533707]
This paper focuses on evaluating and benchmarking the robustness of visual representations in the context of object assembly tasks. We employ a general framework in visuomotor policy learning that utilizes visual pretraining models as vision encoders. Our study investigates the robustness of this framework when applied to a dual-arm manipulation setup, specifically to the grasp variations.
arXiv Detail & Related papers (2023-10-15T20:41:07Z)
SS-VAERR: Self-Supervised Apparent Emotional Reaction Recognition from Video [61.21388780334379]
This work focuses on the apparent emotional reaction recognition from the video-only input, conducted in a self-supervised fashion. The network is first pre-trained on different self-supervised pretext tasks and later fine-tuned on the downstream target task.
arXiv Detail & Related papers (2022-10-20T15:21:51Z)
Embodied Visual Active Learning for Semantic Segmentation [33.02424587900808]
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding. We develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment. We extensively evaluate the proposed models using the Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts.
arXiv Detail & Related papers (2020-12-17T11:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.