Related papers: YO-CSA-T: A Real-time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention

YO-CSA-T: A Real-time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention

URL: http://arxiv.org/abs/2501.06472v1
Date: Sat, 11 Jan 2025 08:00:25 GMT
Title: YO-CSA-T: A Real-time Badminton Tracking System Utilizing YOLO Based on Contextual and Spatial Attention
Authors: Yuan Lai, Zhiwei Shi, Chengxi Zhu,
Abstract summary: YO-CSA is a real-time trajectory detection system for a 3D shuttlecock.<n>We map the 2D coordinate sequence extracted by YO-CSA into 3D space using stereo vision.<n>Our system achieves a high accuracy of 90.43% mAP@0.75, surpassing both YOLOv8s and YOLO11s.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The 3D trajectory of a shuttlecock required for a badminton rally robot for human-robot competition demands real-time performance with high accuracy. However, the fast flight speed of the shuttlecock, along with various visual effects, and its tendency to blend with environmental elements, such as court lines and lighting, present challenges for rapid and accurate 2D detection. In this paper, we first propose the YO-CSA detection network, which optimizes and reconfigures the YOLOv8s model's backbone, neck, and head by incorporating contextual and spatial attention mechanisms to enhance model's ability in extracting and integrating both global and local features. Next, we integrate three major subtasks, detection, prediction, and compensation, into a real-time 3D shuttlecock trajectory detection system. Specifically, our system maps the 2D coordinate sequence extracted by YO-CSA into 3D space using stereo vision, then predicts the future 3D coordinates based on historical information, and re-projects them onto the left and right views to update the position constraints for 2D detection. Additionally, our system includes a compensation module to fill in missing intermediate frames, ensuring a more complete trajectory. We conduct extensive experiments on our own dataset to evaluate both YO-CSA's performance and system effectiveness. Experimental results show that YO-CSA achieves a high accuracy of 90.43% mAP@0.75, surpassing both YOLOv8s and YOLO11s. Our system performs excellently, maintaining a speed of over 130 fps across 12 test sequences.

Related papers

DA-Occ: Efficient 3D Voxel Occupancy Prediction via Directional 2D for Geometric Structure Preservation [13.792614780020061]
Efficient and high-accuracy 3D occupancy prediction is crucial for ensuring the performance of autonomous driving systems.<n>Our method involves slicing 3D voxel features to preserve complete vertical geometric information.<n>This strategy compensates for the loss of height cues in Bird's-Eye View representations, thereby maintaining the integrity of the 3D geometric structure.
arXiv Detail & Related papers (2025-07-31T14:39:31Z)
S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications. It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams. We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z)
Dual-Path Enhancements in Event-Based Eye Tracking: Augmented Robustness and Adaptive Temporal Modeling [0.0]
Event-based eye tracking has become a pivotal technology for augmented reality and human-computer interaction. Existing methods struggle with real-world challenges such as abrupt eye movements and environmental noise. We introduce two key advancements. First, a robust data augmentation pipeline incorporating temporal shift, spatial flip, and event deletion improves model resilience. Second, we propose KnightPupil, a hybrid architecture combining an EfficientNet-B3 backbone for spatial feature extraction, a bidirectional GRU for contextual temporal modeling, and a Linear Time-Varying State-Space Module.
arXiv Detail & Related papers (2025-04-14T07:57:22Z)
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis [61.1662426227688]
Existing NeRF and 3DGS-based methods show promising results in achieving photorealistic renderings but require slow, per-scene optimization. We introduce EVolSplat, an efficient 3D Gaussian Splatting model for urban scenes that works in a feed-forward manner.
arXiv Detail & Related papers (2025-03-26T02:47:27Z)
Fast Occupancy Network [15.759329665907229]
Occupancy Network predicts category of voxel in specified 3D space around ego vehicle.<n>We present a simple and fast Occupancy Network model, which adopts a deformable 2D convolutional layer to lift BEV feature to 3D voxel feature.<n>We also present an efficient voxel feature pyramid network (FPN) module to improve performance with few computational cost.
arXiv Detail & Related papers (2024-12-10T03:46:03Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
VECTOR: Velocity-Enhanced GRU Neural Network for Real-Time 3D UAV Trajectory Prediction [2.1825723033513165]
We propose a new trajectory prediction method using Gated Recurrent Units (GRUs) within sequence-based neural networks. We employ both synthetic and real-world 3D UAV trajectory data, capturing a wide range of flight patterns, speeds, and agility. The GRU-based models significantly outperform state-of-the-art RNN approaches, with a mean square error (MSE) as low as 2 x 10-8.
arXiv Detail & Related papers (2024-10-24T07:16:42Z)
YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object Detection [8.082514573754954]
YOLO-BEV is an efficient framework that harnesses a unique surrounding cameras setup to generate a 2D bird's-eye view of the vehicular environment. Preliminary results validate the feasibility of YOLO-BEV in real-time vehicular perception tasks.
arXiv Detail & Related papers (2023-10-26T13:16:27Z)
Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling [13.284947022380404]
We propose a two-stage framework that can obtain accurate and smooth full-body motions with three tracking signals of head and hands only. Our framework explicitly models the joint-level features in the first stage and utilizes them astemporal tokens for alternating spatial and temporal transformer blocks to capture joint-level correlations in the second stage. With extensive experiments on the AMASS motion dataset and real-captured data, we show our proposed method can achieve more accurate and smooth motion compared to existing approaches.
arXiv Detail & Related papers (2023-08-17T08:27:55Z)
Uncertainty-aware State Space Transformer for Egocentric 3D Hand Trajectory Forecasting [79.34357055254239]
Hand trajectory forecasting is crucial for enabling a prompt understanding of human intentions when interacting with AR/VR systems. Existing methods handle this problem in a 2D image space which is inadequate for 3D real-world applications. We set up an egocentric 3D hand trajectory forecasting task that aims to predict hand trajectories in a 3D space from early observed RGB videos in a first-person view.
arXiv Detail & Related papers (2023-07-17T04:55:02Z)
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection. YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation. YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge [57.647371468876116]
We introduce our real-time 2D object detection system for the realistic autonomous driving scenario. Our detector is built on a newly designed YOLO model, called YOLOX. On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
arXiv Detail & Related papers (2021-07-27T06:36:06Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.