Automated Detection of Sport Highlights from Audio and Video Sources
- URL: http://arxiv.org/abs/2501.16100v2
- Date: Fri, 31 Jan 2025 09:03:03 GMT
- Title: Automated Detection of Sport Highlights from Audio and Video Sources
- Authors: Francesco Della Santa, Morgana Lalli,
- Abstract summary: This study presents a novel Deep Learning-based and lightweight approach for the automated detection of sports highlights (HLs) from audio and video sources.
Our solution leverages Deep Learning (DL) models trained on relatively small datasets of audio Mel-spectrograms and grayscale video frames, achieving promising accuracy rates of 89% and 83% for audio and video detection, respectively.
The proposed methodology offers a scalable solution for automated HL detection across various types of sports video content, reducing the need for manual intervention.
- Score: 0.0
- License:
- Abstract: This study presents a novel Deep Learning-based and lightweight approach for the automated detection of sports highlights (HLs) from audio and video sources. HL detection is a key task in sports video analysis, traditionally requiring significant human effort. Our solution leverages Deep Learning (DL) models trained on relatively small datasets of audio Mel-spectrograms and grayscale video frames, achieving promising accuracy rates of 89% and 83% for audio and video detection, respectively. The use of small datasets, combined with simple architectures, demonstrates the practicality of our method for fast and cost-effective deployment. Furthermore, an ensemble model combining both modalities shows improved robustness against false positives and false negatives. The proposed methodology offers a scalable solution for automated HL detection across various types of sports video content, reducing the need for manual intervention. Future work will focus on enhancing model architectures and extending this approach to broader scene-detection tasks in media analysis.
Related papers
- T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs [102.66246727371583]
We develop a method called T2Vid to synthesize video-like samples to enrich the instruction diversity in the training corpus.
We find that the proposed scheme can boost the performance of long video understanding without training with long video samples.
arXiv Detail & Related papers (2024-11-29T18:59:54Z) - Advancing Automated Deception Detection: A Multimodal Approach to Feature Extraction and Analysis [0.0]
This research focuses on the extraction and combination of various features to enhance the accuracy of deception detection models.
By systematically extracting features from visual, audio, and text data, and experimenting with different combinations, we developed a robust model that achieved an impressive 99% accuracy.
arXiv Detail & Related papers (2024-07-08T14:59:10Z) - OSL-ActionSpotting: A Unified Library for Action Spotting in Sports Videos [56.393522913188704]
We introduce OSL-ActionSpotting, a Python library that unifies different action spotting algorithms to streamline research and applications in sports video analytics.
We successfully integrated three cornerstone action spotting methods into OSL-ActionSpotting, achieving performance metrics that match those of the original, disparates.
arXiv Detail & Related papers (2024-07-01T13:17:37Z) - AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
Multiple Experts for Video Deepfake Detection [53.448283629898214]
The recent proliferation of hyper-realistic deepfake videos has drawn attention to the threat of audio and visual forgeries.
Most previous work on detecting AI-generated fake videos only utilize visual modality or audio modality.
We propose an Audio-Visual Transformer-based Ensemble Network (AVTENet) framework that considers both acoustic manipulation and visual manipulation.
arXiv Detail & Related papers (2023-10-19T19:01:26Z) - LoRA-like Calibration for Multimodal Deception Detection using ATSFace
Data [1.550120821358415]
We introduce an attention-aware neural network addressing challenges inherent in video data and deception dynamics.
We employ a multimodal fusion strategy that enhances accuracy; our approach yields a 92% accuracy rate on a real-life trial dataset.
arXiv Detail & Related papers (2023-09-04T06:22:25Z) - Towards Active Learning for Action Spotting in Association Football
Videos [59.84375958757395]
Analyzing football videos is challenging and requires identifying subtle and diverse-temporal patterns.
Current algorithms face significant challenges when learning from limited annotated data.
We propose an active learning framework that selects the most informative video samples to be annotated next.
arXiv Detail & Related papers (2023-04-09T11:50:41Z) - AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot
AV-ASR [79.21857972093332]
We present AVFormer, a method for augmenting audio-only models with visual information, at the same time performing lightweight domain adaptation.
We show that these can be trained on a small amount of weakly labelled video data with minimum additional training time and parameters.
We also introduce a simple curriculum scheme during training which we show is crucial to enable the model to jointly process audio and visual information effectively.
arXiv Detail & Related papers (2023-03-29T07:24:28Z) - Sports Video Analysis on Large-Scale Data [10.24207108909385]
This paper investigates the modeling of automated machine description on sports video.
We propose a novel large-scale NBA dataset for Sports Video Analysis (NSVA) with a focus on captioning.
arXiv Detail & Related papers (2022-08-09T16:59:24Z) - Few-Cost Salient Object Detection with Adversarial-Paced Learning [95.0220555274653]
This paper proposes to learn the effective salient object detection model based on the manual annotation on a few training images only.
We name this task as the few-cost salient object detection and propose an adversarial-paced learning (APL)-based framework to facilitate the few-cost learning scenario.
arXiv Detail & Related papers (2021-04-05T14:15:49Z) - A New Action Recognition Framework for Video Highlights Summarization in
Sporting Events [9.870478438166288]
We present a framework to automatically clip the sports video stream by using a three-level prediction algorithm based on two classical open-source structures, i.e., YOLO-v3 and OpenPose.
It is found that by using a modest amount of sports video training data, our methodology can perform sports activity highlights clipping accurately.
arXiv Detail & Related papers (2020-12-01T04:14:40Z) - Unsupervised Temporal Feature Aggregation for Event Detection in
Unstructured Sports Videos [10.230408415438966]
We study the case of event detection in sports videos for unstructured environments with arbitrary camera angles.
We identify and solve two major problems: unsupervised identification of players in an unstructured setting and generalization of the trained models to pose variations due to arbitrary shooting angles.
arXiv Detail & Related papers (2020-02-19T10:24:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.