Related papers: LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN

LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN

URL: http://arxiv.org/abs/2201.09049v1
Date: Sat, 22 Jan 2022 13:54:13 GMT
Title: LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN
Authors: Ghulam Mujtaba, Adeel Malik, and Eun-Seok Ryu
Abstract summary: This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. It generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device.
Score: 5.95248889179516
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a novel lightweight thumbnail container-based summarization (LTC-SUM) framework for full feature-length videos. This framework generates a personalized keyshot summary for concurrent users by using the computational resource of the end-user device. State-of-the-art methods that acquire and process entire video data to generate video summaries are highly computationally intensive. In this regard, the proposed LTC-SUM method uses lightweight thumbnails to handle the complex process of detecting events. This significantly reduces computational complexity and improves communication and storage efficiency by resolving computational and privacy bottlenecks in resource-constrained end-user devices. These improvements were achieved by designing a lightweight 2D CNN model to extract features from thumbnails, which helped select and retrieve only a handful of specific segments. Extensive quantitative experiments on a set of full 18 feature-length videos (approximately 32.9 h in duration) showed that the proposed method is significantly computationally efficient than state-of-the-art methods on the same end-user device configurations. Joint qualitative assessments of the results of 56 participants showed that participants gave higher ratings to the summaries generated using the proposed method. To the best of our knowledge, this is the first attempt in designing a fully client-driven personalized keyshot video summarization framework using thumbnail containers for feature-length videos.

Related papers

EdgeVidSum: Real-Time Personalized Video Summarization at the Edge [3.102586911584193]
EdgeVidSum is a method that generates personalized, fast-forward summaries of long-form videos directly on edge devices.<n>The framework employs a hierarchical analysis approach, where a lightweight 2D CNN model identifies user-preferred content from thumbnails.<n>Our interactive demo highlights the system's ability to create tailored video summaries for long-form videos, such as movies, sports events, and TV shows, based on individual user preferences.
arXiv Detail & Related papers (2025-05-28T18:59:41Z)
Generating Narrated Lecture Videos from Slides with Synchronized Highlights [55.2480439325792]
We introduce an end-to-end system designed to automate the process of turning static slides into video lectures.<n>This system synthesizes a video lecture featuring AI-generated narration precisely synchronized with dynamic visual highlights.<n>We demonstrate the system's effectiveness through a technical evaluation using a manually annotated slide dataset with 1000 samples.
arXiv Detail & Related papers (2025-05-05T18:51:53Z)
Parameter-free Video Segmentation for Vision and Language Understanding [55.20132267309382]
We propose an algorithm for segmenting videos into contiguous chunks, based on the minimum description length principle. The algorithm is entirely parameter-free, given feature vectors, not requiring a set threshold or the number or size of chunks to be specified.
arXiv Detail & Related papers (2025-03-03T05:54:37Z)
Scaling Up Video Summarization Pretraining with Large Language Models [73.74662411006426]
We introduce an automated and scalable pipeline for generating a large-scale video summarization dataset. We analyze the limitations of existing approaches and propose a new video summarization model that effectively addresses them. Our work also presents a new benchmark dataset that contains 1200 long videos each with high-quality summaries annotated by professionals.
arXiv Detail & Related papers (2024-04-04T11:59:06Z)
LTC-GIF: Attracting More Clicks on Feature-length Sports Videos [4.776806621717593]
This paper proposes a lightweight method to attract users and increase views of the video by presenting personalized artistic media. It analyzes lightweight thumbnail containers (LTC) using computational resources of the client device to recognize personalized events from full-length sports videos. Instead of processing the entire video, small video segments are processed to generate artistic media.
arXiv Detail & Related papers (2022-01-22T15:34:10Z)
OCSampler: Compressing Videos to One Clip with Single-step Sampling [82.0417131211353]
We propose a framework named OCSampler to explore a compact yet effective video representation with one short clip. Our basic motivation is that the efficient video recognition task lies in processing a whole sequence at once rather than picking up frames sequentially.
arXiv Detail & Related papers (2022-01-12T09:50:38Z)
Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos. We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z)
LocFormer: Enabling Transformers to Perform Temporal Moment Localization on Long Untrimmed Videos With a Feature Sampling Approach [35.93734845932161]
LocFormer is a Transformer-based model for video grounding that operates at a constant memory footprint regardless of the video length. We propose a modular design that separates functionality, enabling us to learn an inductive bias via supervising the self-attention heads.
arXiv Detail & Related papers (2021-12-19T05:32:14Z)
HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos. We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions. We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z)
Providing Meaningful Data Summarizations Using Examplar-based Clustering in Industry 4.0 [67.80123919697971]
We show, that our GPU implementation provides speedups of up to 72x using single-precision and up to 452x using half-precision compared to conventional CPU algorithms. We apply our algorithm to real-world data from injection molding manufacturing processes and discuss how found summaries help with steering this specific process to cut costs and reduce the manufacturing of bad parts.
arXiv Detail & Related papers (2021-05-25T15:55:14Z)
Efficient Video Summarization Framework using EEG and Eye-tracking Signals [0.92246583941469]
This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims. To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology. Using our approach, a video is summarized by 96.5% while maintaining higher precision and high recall factors.
arXiv Detail & Related papers (2021-01-27T08:13:19Z)
CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation [67.17625278621134]
Video instance segmentation is a complex task in which we need to detect, segment, and track each object for any given video. Previous approaches only utilize single-frame features for the detection, segmentation, and tracking of objects. We propose a novel comprehensive feature aggregation approach (CompFeat) to refine features at both frame-level and object-level with temporal and spatial context information.
arXiv Detail & Related papers (2020-12-07T00:31:42Z)
Temporal Context Aggregation for Video Retrieval with Contrastive Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features. The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
SummaryNet: A Multi-Stage Deep Learning Model for Automatic Video Summarisation [0.0]
We introduce SummaryNet as a supervised learning framework for automated video summarisation. It employs a two-stream convolutional network to learn spatial (appearance) and temporal (motion) representations.
arXiv Detail & Related papers (2020-02-19T18:24:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.