Related papers: Motion Focus Recognition in Fast-Moving Egocentric Video

Motion Focus Recognition in Fast-Moving Egocentric Video

URL: http://arxiv.org/abs/2601.07154v1
Date: Mon, 12 Jan 2026 02:53:51 GMT
Title: Motion Focus Recognition in Fast-Moving Egocentric Video
Authors: Daniel Hong, James Tribble, Hao Wang, Chaoyi Zhou, Ashish Bastola, Siyu Huang, Abolfazl Razi,
Abstract summary: We propose a real-time motion focus recognition method that estimates the subject's intention from any egocentric video.<n> Evaluated on a collected egocentric action dataset, our method achieves real-time performance with manageable memory consumption.
Score: 17.363244905814756
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: From Vision-Language-Action (VLA) systems to robotics, existing egocentric datasets primarily focus on action recognition tasks, while largely overlooking the inherent role of motion analysis in sports and other fast-movement scenarios. To bridge this gap, we propose a real-time motion focus recognition method that estimates the subject's locomotion intention from any egocentric video. Our approach leverages the foundation model for camera pose estimation and introduces system-level optimizations to enable efficient and scalable inference. Evaluated on a collected egocentric action dataset, our method achieves real-time performance with manageable memory consumption through a sliding batch inference strategy. This work makes motion-centric analysis practical for edge deployment and offers a complementary perspective to existing egocentric studies on sports and fast-movement activities.

Related papers

GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry [61.24189040578178]
We propose a fully learning-based approach that directly infers moving objects from latent feature representations via attention mechanisms.<n>Our key insight is to bypass explicit correspondence estimation and instead let the model learn to implicitly disentangle object and camera motion.<n>Our approach achieves state-of-the-art motion segmentation performance with high efficiency.
arXiv Detail & Related papers (2026-02-25T11:36:33Z)
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention [58.05340906967343]
Egocentric Referring Video Object (Ego-RVOS) aims to segment the specific object actively involved in a human action, as described by a language query, within first-person videos.<n>Existing methods often struggle, learning spurious correlations from skewed object-action pairings in datasets.<n>We introduce Causal-REferring (CERES), a plug-in causal framework that adapts strong, pre-trained RVOSs to the egocentric domain.
arXiv Detail & Related papers (2025-12-30T16:22:14Z)
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation [26.03507721111338]
We introduce Egocentric Motion Generation and Egocentric Motion Forecasting, two novel tasks that utilize first-person images for scene-aware motion synthesis.<n>We propose UniEgoMotion, a unified conditional motion diffusion model with a novel head-centric motion representation tailored for egocentric devices.<n>UniEgoMotion achieves state-of-the-art performance in egocentric motion reconstruction and is the first to generate motion from a single egocentric image.
arXiv Detail & Related papers (2025-08-02T00:41:20Z)
Joint-Motion Mutual Learning for Pose Estimation in Videos [21.77871402339573]
Human pose estimation in videos has long been a compelling yet challenging task within the realm of computer vision. Recent methods strive to integrate multi-frame visual features generated by a backbone network for pose estimation. We propose a novel joint-motion mutual learning framework for pose estimation.
arXiv Detail & Related papers (2024-08-05T07:37:55Z)
Object Aware Egocentric Online Action Detection [23.504280692701272]
We introduce an Object-Aware Module that integrates egocentric-specific priors into existing Online Action Detection frameworks. Our work can be seamlessly integrated into existing models with minimal overhead and bring consistent performance enhancements.
arXiv Detail & Related papers (2024-06-03T07:58:40Z)
Motor Focus: Fast Ego-Motion Prediction for Assistive Visual Navigation [3.837186701755568]
Motor Focus is an image-based framework that predicts the observer's motion direction based on their visual feeds. Our framework demonstrates its superiority in speed (> 40FPS), accuracy (MAE = 60pixels), and robustness (SNR = 23dB)
arXiv Detail & Related papers (2024-04-25T20:45:39Z)
MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor. Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z)
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective [13.776455033015216]
We introduce a novel cross-view learning approach to action recognition. First, we present a novel geometric-based constraint into the self-attention mechanism in Transformer. Then, we propose a new cross-view self-attention loss learned on unpaired cross-view data to enforce the self-attention mechanism learning to transfer knowledge across views.
arXiv Detail & Related papers (2023-05-25T04:14:49Z)
Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp. SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z)
Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task. We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z)
Self-supervised Video Object Segmentation by Motion Grouping [79.13206959575228]
We develop a computer vision system able to segment objects by exploiting motion cues. We introduce a simple variant of the Transformer to segment optical flow frames into primary objects and the background. We evaluate the proposed architecture on public benchmarks (DAVIS2016, SegTrackv2, and FBMS59)
arXiv Detail & Related papers (2021-04-15T17:59:32Z)
Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking) We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.