Related papers: Long-term Human Participation Assessment In Collaborative Learning Environments Using Dynamic Scene Analysis

Long-term Human Participation Assessment In Collaborative Learning Environments Using Dynamic Scene Analysis

URL: http://arxiv.org/abs/2405.02317v1
Date: Sun, 14 Apr 2024 21:39:00 GMT
Title: Long-term Human Participation Assessment In Collaborative Learning Environments Using Dynamic Scene Analysis
Authors: Wenjing Shi, Phuong Tran, Sylvia Celedón-Pattichis, Marios S. Pattichis,
Abstract summary: The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. We formulate the problem of assessing student participation into two subproblems: (i) student group detection against strong background interference from other groups, and (ii) dynamic participant tracking within the group.
Score: 2.115993069505241
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the camera. We formulate the problem of assessing student participation into two subproblems: (i) student group detection against strong background interference from other groups, and (ii) dynamic participant tracking within the group. A massive independent testing dataset of 12,518,250 student label instances, of total duration of 21 hours and 22 minutes of real-life videos, is used for evaluating the performance of our proposed method for student group detection. The proposed method of using multiple image representations is shown to perform equally or better than YOLO on all video instances. Over the entire dataset, the proposed method achieved an F1 score of 0.85 compared to 0.80 for YOLO. Following student group detection, the paper presents the development of a dynamic participant tracking system for assessing student group participation through long video sessions. The proposed dynamic participant tracking system is shown to perform exceptionally well, missing a student in just one out of 35 testing videos. In comparison, a state of the art method fails to track students in 14 out of the 35 testing videos. The proposed method achieves 82.3% accuracy on an independent set of long, real-life collaborative videos.

Related papers

Human-in-the-loop Adaptation in Group Activity Feature Learning for Team Sports Video Retrieval [17.686293914812154]
This paper proposes human-in-the-loop adaptation for Group Activity Feature Learning (GAFL) without group activity annotations.<n>Our method pre-trains the space based on the similarity of group activities in a self-supervised manner.<n>Our comprehensive experimental results on two team sports datasets validate that our method significantly improves the retrieval performance.
arXiv Detail & Related papers (2026-02-03T06:15:43Z)
Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification [0.6103775976356991]
We propose a novel three-stage framework for video-based student engagement measurement.<n>First, we explore the few-shot adaptation of the vision-language model for student action recognition.<n>Second, we utilize the sliding temporal window technique to divide each student's 2-minute-long video into non-overlapping segments.<n>Third, we leverage the large language model to classify this entire sequence of actions, together with the classroom context, as belonging to an engaged or disengaged student.
arXiv Detail & Related papers (2026-01-10T02:39:24Z)
VideoTG-R1: Boosting Video Temporal Grounding via Curriculum Reinforcement Learning on Reflected Boundary Annotations [59.40631942092535]
Video temporal grounding (VTG) aims to locate precise segments in videos based on language queries.<n>Recent Multimodal Large Language Models (MLLMs) have shown promise in tackling VTG through reinforcement learning (RL)<n>We propose VideoTG-R1, a novel curriculum RL framework with reflected boundary annotations, enabling data-efficient training.
arXiv Detail & Related papers (2025-10-27T14:55:38Z)
Vocal Sandbox: Continual Learning and Adaptation for Situated Human-Robot Collaboration [64.6107798750142]
Vocal Sandbox is a framework for enabling seamless human-robot collaboration in situated environments. We design lightweight and interpretable learning algorithms that allow users to build an understanding and co-adapt to a robot's capabilities in real-time. We evaluate Vocal Sandbox in two settings: collaborative gift bag assembly and LEGO stop-motion animation.
arXiv Detail & Related papers (2024-11-04T20:44:40Z)
Towards Student Actions in Classroom Scenes: New Dataset and Baseline [43.268586725768465]
We present a new multi-label student action video (SAV) dataset for complex classroom scenes. The dataset consists of 4,324 carefully trimmed video clips from 758 different classrooms, each labeled with 15 different actions displayed by students in classrooms.
arXiv Detail & Related papers (2024-09-02T03:44:24Z)
3rd Place Solution for MOSE Track in CVPR 2024 PVUW workshop: Complex Video Object Segmentation [63.199793919573295]
Video Object (VOS) is a vital task in computer vision, focusing on distinguishing foreground objects from the background across video frames. Our work draws inspiration from the Cutie model, and we investigate the effects of object memory, the total number of memory frames, and input resolution on segmentation performance.
arXiv Detail & Related papers (2024-06-06T00:56:25Z)
Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024 [12.274092278786966]
We adopt semi-supervised video semantic segmentation method based on unreliable pseudo labels. Our method achieves the mIoU scores of 63.71% and 67.83% on development test and final test respectively. We obtain the 1st place in the Video Scene Parsing in the Wild Challenge at CVPR 2024.
arXiv Detail & Related papers (2024-06-02T01:37:26Z)
Group Activity Recognition using Unreliable Tracked Pose [8.592249538742527]
Group activity recognition in video is a complex task due to the need for a model to recognise the actions of all individuals in the video. We introduce an innovative deep learning-based group activity recognition approach called Rendered Pose based Group Activity Recognition System (RePGARS)
arXiv Detail & Related papers (2024-01-06T17:36:13Z)
Measuring Student Behavioral Engagement using Histogram of Actions [0.0]
The proposed approach recognizes student actions then predicts the student behavioral engagement level. For student action recognition, we use human skeletons to model student postures and upper body movements. The trained 3D-CNN model is used to recognize actions within every 2minute video segment.
arXiv Detail & Related papers (2023-07-18T16:37:37Z)
Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning [7.040747348755578]
Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. We propose a methodology for predicting student performance from their online learning activities that optimize inference accuracy over different demographic groups such as race and gender.
arXiv Detail & Related papers (2022-08-02T00:22:20Z)
Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation. We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z)
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency [62.38914747727636]
We study self-supervised video representation learning, which is a challenging task due to 1) a lack of labels for explicit supervision and 2) unstructured and noisy visual information. Existing methods mainly use contrastive loss with video clips as the instances and learn visual representation by discriminating instances from each other. In this paper, we observe that the consistency between positive samples is the key to learn robust video representations.
arXiv Detail & Related papers (2021-06-04T08:44:50Z)
CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis. Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge. We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z)
ZSTAD: Zero-Shot Temporal Activity Detection [107.63759089583382]
We propose a novel task setting called zero-shot temporal activity detection (ZSTAD), where activities that have never been seen in training can still be detected. We design an end-to-end deep network based on R-C3D as the architecture for this solution. Experiments on both the THUMOS14 and the Charades datasets show promising performance in terms of detecting unseen activities.
arXiv Detail & Related papers (2020-03-12T02:40:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.