LTC-GIF: Attracting More Clicks on Feature-length Sports Videos
- URL: http://arxiv.org/abs/2201.09077v1
- Date: Sat, 22 Jan 2022 15:34:10 GMT
- Title: LTC-GIF: Attracting More Clicks on Feature-length Sports Videos
- Authors: Ghulam Mujtaba, Jaehyuk Choi, and Eun-Seok Ryu
- Abstract summary: This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media.
It analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos.
Instead of processing the entire video, small video segments are processed to generate artistic media.
- Score: 4.776806621717593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a lightweight method to attract users and increase views
of the video by presenting personalized artistic media -- i.e, static
thumbnails and animated GIFs. This method analyzes lightweight thumbnail
containers (LTC) using computational resources of the client device to
recognize personalized events from full-length sports videos. In addition,
instead of processing the entire video, small video segments are processed to
generate artistic media. This makes the proposed approach more computationally
efficient compared to the baseline approaches that create artistic media using
the entire video. The proposed method retrieves and uses thumbnail containers
and video segments, which reduces the required transmission bandwidth as well
as the amount of locally stored data used during artistic media generation.
When extensive experiments were conducted on the Nvidia Jetson TX2, the
computational complexity of the proposed method was 3.57 times lower than that
of the SoA method. In the qualitative assessment, GIFs generated using the
proposed method received 1.02 higher overall ratings compared to the SoA
method. To the best of our knowledge, this is the first technique that uses LTC
to generate artistic media while providing lightweight and high-performance
services even on resource-constrained devices.
Related papers
- EdgeVidSum: Real-Time Personalized Video Summarization at the Edge [3.102586911584193]
EdgeVidSum is a method that generates personalized, fast-forward summaries of long-form videos directly on edge devices.<n>The framework employs a hierarchical analysis approach, where a lightweight 2D CNN model identifies user-preferred content from thumbnails.<n>Our interactive demo highlights the system's ability to create tailored video summaries for long-form videos, such as movies, sports events, and TV shows, based on individual user preferences.
arXiv Detail & Related papers (2025-05-28T18:59:41Z) - Magic 1-For-1: Generating One Minute Video Clips within One Minute [53.07214657235465]
We present Magic 1-For-1 (Magic141), an efficient video generation model with optimized memory consumption and inference latency.
By applying a test time sliding window, we are able to generate a minute-long video within one minute with significantly improved visual quality and motion dynamics.
arXiv Detail & Related papers (2025-02-11T16:58:15Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Revitalizing Legacy Video Content: Deinterlacing with Bidirectional
Information Propagation [14.340811078427553]
We present a deep-learning-based method for deinterlacing animated and live-action video content.
Our proposed method supports bidirectional-temporal information propagation across multiple scales.
Our method can process multiple fields simultaneously, reducing per-frame time, and potentially enabling real-time processing.
arXiv Detail & Related papers (2023-10-30T13:43:19Z) - Building an Open-Vocabulary Video CLIP Model with Better Architectures,
Optimization and Data [102.0069667710562]
This paper presents Open-VCLIP++, a framework that adapts CLIP to a strong zero-shot video classifier.
We demonstrate that training Open-VCLIP++ is tantamount to continual learning with zero historical data.
Our approach is evaluated on three widely used action recognition datasets.
arXiv Detail & Related papers (2023-10-08T04:46:43Z) - Audio-Driven Dubbing for User Generated Contents via Style-Aware
Semi-Parametric Synthesis [123.11530365315677]
Existing automated dubbing methods are usually designed for Professionally Generated Content (PGC) production.
In this paper, we investigate an audio-driven dubbing method that is more feasible for User Generated Content (UGC) production.
arXiv Detail & Related papers (2023-08-31T15:41:40Z) - Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - Unified Perception: Efficient Depth-Aware Video Panoptic Segmentation
with Minimal Annotation Costs [2.7920304852537536]
We present a new approach titled Unified Perception that achieves state-of-the-art performance without requiring video-based training.
Our method employs a simple two-stage cascaded tracking algorithm that (re)uses object embeddings computed in an image-based network.
arXiv Detail & Related papers (2023-03-03T15:00:12Z) - A Feature-space Multimodal Data Augmentation Technique for Text-video
Retrieval [16.548016892117083]
Text-video retrieval methods have received increased attention over the past few years.
Data augmentation techniques were introduced to increase the performance on unseen test examples.
We propose a multimodal data augmentation technique which works in the feature space and creates new videos and captions by mixing semantically similar samples.
arXiv Detail & Related papers (2022-08-03T14:05:20Z) - LTC-SUM: Lightweight Client-driven Personalized Video Summarization
Framework Using 2D CNN [5.95248889179516]
This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos.
It generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device.
arXiv Detail & Related papers (2022-01-22T13:54:13Z) - Boosting the Performance of Video Compression Artifact Reduction with
Reference Frame Proposals and Frequency Domain Information [31.053879834073502]
We propose an effective reference frame proposal strategy to boost the performance of the existing multi-frame approaches.
Experimental results show that our method achieves better fidelity and perceptual performance on MFQE 2.0 dataset than the state-of-the-art methods.
arXiv Detail & Related papers (2021-05-31T13:46:11Z) - Space-Time Crop & Attend: Improving Cross-modal Video Representation
Learning [88.71867887257274]
We show that spatial augmentations such as cropping work well for videos too, but that previous implementations could not do this at a scale sufficient for it to work well.
To address this issue, we first introduce Feature Crop, a method to simulate such augmentations much more efficiently directly in feature space.
Second, we show that as opposed to naive average pooling, the use of transformer-based attention performance improves significantly.
arXiv Detail & Related papers (2021-03-18T12:32:24Z) - Efficient video integrity analysis through container characterization [77.45740041478743]
We introduce a container-based method to identify the software used to perform a video manipulation.
The proposed method is both efficient and effective and can also provide a simple explanation for its decisions.
It achieves an accuracy of 97.6% in distinguishing pristine from tampered videos and classifying the editing software.
arXiv Detail & Related papers (2021-01-26T14:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.