LTC-SUM: Lightweight Client-driven Personalized Video Summarization
Framework Using 2D CNN
- URL: http://arxiv.org/abs/2201.09049v1
- Date: Sat, 22 Jan 2022 13:54:13 GMT
- Title: LTC-SUM: Lightweight Client-driven Personalized Video Summarization
Framework Using 2D CNN
- Authors: Ghulam Mujtaba, Adeel Malik, and Eun-Seok Ryu
- Abstract summary: This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos.
It generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device.
- Score: 5.95248889179516
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes a novel lightweight thumbnail container-based
summarization (LTC-SUM) framework for full feature-length videos. This
framework generates a personalized keyshot summary for concurrent users by
using the computational resource of the end-user device. State-of-the-art
methods that acquire and process entire video data to generate video summaries
are highly computationally intensive. In this regard, the proposed LTC-SUM
method uses lightweight thumbnails to handle the complex process of detecting
events. This significantly reduces computational complexity and improves
communication and storage efficiency by resolving computational and privacy
bottlenecks in resource-constrained end-user devices. These improvements were
achieved by designing a lightweight 2D CNN model to extract features from
thumbnails, which helped select and retrieve only a handful of specific
segments. Extensive quantitative experiments on a set of full 18 feature-length
videos (approximately 32.9 h in duration) showed that the proposed method is
significantly computationally efficient than state-of-the-art methods on the
same end-user device configurations. Joint qualitative assessments of the
results of 56 participants showed that participants gave higher ratings to the
summaries generated using the proposed method. To the best of our knowledge,
this is the first attempt in designing a fully client-driven personalized
keyshot video summarization framework using thumbnail containers for
feature-length videos.
Related papers
- EdgeVidSum: Real-Time Personalized Video Summarization at the Edge [3.102586911584193]
EdgeVidSum is a method that generates personalized, fast-forward summaries of long-form videos directly on edge devices.<n>The framework employs a hierarchical analysis approach, where a lightweight 2D CNN model identifies user-preferred content from thumbnails.<n>Our interactive demo highlights the system's ability to create tailored video summaries for long-form videos, such as movies, sports events, and TV shows, based on individual user preferences.
arXiv Detail & Related papers (2025-05-28T18:59:41Z) - Generating Narrated Lecture Videos from Slides with Synchronized Highlights [55.2480439325792]
We introduce an end-to-end system designed to automate the process of turning static slides into video lectures.<n>This system synthesizes a video lecture featuring AI-generated narration precisely synchronized with dynamic visual highlights.<n>We demonstrate the system's effectiveness through a technical evaluation using a manually annotated slide dataset with 1000 samples.
arXiv Detail & Related papers (2025-05-05T18:51:53Z) - Parameter-free Video Segmentation for Vision and Language Understanding [55.20132267309382]
We propose an algorithm for segmenting videos into contiguous chunks, based on the minimum description length principle.
The algorithm is entirely parameter-free, given feature vectors, not requiring a set threshold or the number or size of chunks to be specified.
arXiv Detail & Related papers (2025-03-03T05:54:37Z) - Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset.
We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them.
Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z) - LTC-GIF: Attracting More Clicks on Feature-length Sports Videos [4.776806621717593]
This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media.
It analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos.
Instead of processing the entire video, small video segments are processed to generate artistic media.
arXiv Detail & Related papers (2022-01-22T15:34:10Z) - OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip.
Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - LocFormer: Enabling Transformers to Perform Temporal Moment Localization
on Long Untrimmed Videos With a Feature Sampling Approach [35.93734845932161]
LocFormer is a Transformer-based model for video grounding that operates at a constant memory footprint regardless of the video length.
We propose a modular design that separates functionality, enabling us to learn an inductive bias via supervising the self-attention heads.
arXiv Detail & Related papers (2021-12-19T05:32:14Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Providing Meaningful Data Summarizations Using Examplar-based Clustering
in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms.
We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z) - Efficient Video Summarization Framework using EEG and Eye-tracking
Signals [0.92246583941469]
This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims.
To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology.
Using our approach, a video is summarized by 96.5% while maintaining higher precision and high recall factors.
arXiv Detail & Related papers (2021-01-27T08:13:19Z) - CompFeat: Comprehensive Feature Aggregation for Video Instance
Segmentation [67.17625278621134]
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video.
Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects.
We propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information.
arXiv Detail & Related papers (2020-12-07T00:31:42Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z) - SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video
Summarisation [0.0]
We introduce SummaryNet as a supervised learning framework for automated video summarisation.
It employs a two-stream convolutional network to learn spatial (appearance) and temporal (motion) representations.
arXiv Detail & Related papers (2020-02-19T18:24:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.