Related papers: SkillSight: Efficient First-Person Skill Assessment with Gaze

SkillSight: Efficient First-Person Skill Assessment with Gaze

URL: http://arxiv.org/abs/2511.19629v1
Date: Mon, 24 Nov 2025 19:05:28 GMT
Title: SkillSight: Efficient First-Person Skill Assessment with Gaze
Authors: Chi Hsuan Wu, Kumar Ashutosh, Kristen Grauman,
Abstract summary: We introduce SkillSight for power-efficient skill assessment from first-person data.<n>Our two-stage framework learns to jointly model gaze and egocentric video when predicting skill level, then distills a gaze-only student model.<n>Experiments on three datasets spanning cooking, music, and sports establish, for the first time, the valuable role of gaze in skill understanding.
Score: 51.16409727318035
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Egocentric perception on smart glasses could transform how we learn new skills in the physical world, but automatic skill assessment remains a fundamental technical challenge. We introduce SkillSight for power-efficient skill assessment from first-person data. Central to our approach is the hypothesis that skill level is evident not only in how a person performs an activity (video), but also in how they direct their attention when doing so (gaze). Our two-stage framework first learns to jointly model gaze and egocentric video when predicting skill level, then distills a gaze-only student model. At inference, the student model requires only gaze input, drastically reducing power consumption by eliminating continuous video processing. Experiments on three datasets spanning cooking, music, and sports establish, for the first time, the valuable role of gaze in skill understanding across diverse real-world settings. Our SkillSight teacher model achieves state-of-the-art performance, while our gaze-only student variant maintains high accuracy using 73x less power than competing methods. These results pave the way for in-the-wild AI-supported skill learning.

Related papers

Learning Skill-Attributes for Transferable Assessment in Video [56.813876909367856]
Skill assessment from video entails rating the quality of a person's physical performance and explaining what could be done better.<n>Our CrossTrainer approach discovers skill-attributes, such as balance, control, and hand positioning.<n>By abstracting out the shared behaviors indicative of human skill, the proposed video representation generalizes substantially better than an array of existing techniques.
arXiv Detail & Related papers (2025-11-17T23:53:06Z)
Comparing Learning Paradigms for Egocentric Video Summarization [0.0]
This study investigates computer vision paradigms by assessing their ability to understand and interpret egocentric video data.<n>We examine Shotluck Holmes (state-of-the-art supervised learning), TAC-SUM (state-of-the-art unsupervised learning), and GPT-4o (a prompt fine-tuned pre-trained model), evaluating their effectiveness in video summarization.
arXiv Detail & Related papers (2025-06-26T21:46:48Z)
SkillMimic: Learning Basketball Interaction Skills from Demonstrations [85.23012579911378]
We introduce SkillMimic, a unified data-driven framework that fundamentally changes how agents learn interaction skills.<n>Our key insight is that a unified HOI imitation reward can effectively capture the essence of diverse interaction patterns from HOI datasets.<n>For evaluation, we collect and introduce two basketball datasets containing approximately 35 minutes of diverse basketball skills.
arXiv Detail & Related papers (2024-08-12T15:19:04Z)
ExpertAF: Expert Actionable Feedback from Video [81.46431188306397]
We introduce a novel method to generate actionable feedback from video of a person doing a physical activity, such as basketball or soccer.<n>Our method takes a video demonstration and its accompanying 3D body pose and generates expert commentary describing what the person is doing well and what they could improve.<n>We show how to leverage Ego-Exo4D's [29] videos of skilled activity and expert commentary together with a strong language model to create a weakly-supervised training dataset for this task.
arXiv Detail & Related papers (2024-08-01T16:13:07Z)
Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition [3.07176124710244]
Motor skills, especially fine motor skills like handwriting, play an essential role in academic pursuits and everyday life. Traditional methods to teach these skills, although effective, can be time-consuming and inconsistent. We introduce an AI teacher model that captures the distinct characteristics of human instructors.
arXiv Detail & Related papers (2023-10-16T11:11:43Z)
Choreographer: Learning and Adapting Skills in Imagination [60.09911483010824]
We present Choreographer, a model-based agent that exploits its world model to learn and adapt skills in imagination. Our method decouples the exploration and skill learning processes, being able to discover skills in the latent state space of the model. Choreographer is able to learn skills both from offline data, and by collecting data simultaneously with an exploration policy.
arXiv Detail & Related papers (2022-11-23T23:31:14Z)
Graph-based Exercise- and Knowledge-Aware Learning Network for Student Performance Prediction [8.21303828329009]
We propose a Graph-based Exercise- and Knowledge-Aware Learning Network for accurate student score prediction. We learn students' mastery of exercises and knowledge concepts respectively to model the two-fold effects of exercises and knowledge concepts.
arXiv Detail & Related papers (2021-06-01T06:53:17Z)
Emergent Real-World Robotic Skills via Unsupervised Off-Policy Reinforcement Learning [81.12201426668894]
We develop efficient reinforcement learning methods that acquire diverse skills without any reward function, and then repurpose these skills for downstream tasks. We show that our proposed algorithm provides substantial improvement in learning efficiency, making reward-free real-world training feasible. We also demonstrate that the learned skills can be composed using model predictive control for goal-oriented navigation, without any additional training.
arXiv Detail & Related papers (2020-04-27T17:38:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.