Efficient Video Summarization Framework using EEG and Eye-tracking
Signals
- URL: http://arxiv.org/abs/2101.11249v1
- Date: Wed, 27 Jan 2021 08:13:19 GMT
- Title: Efficient Video Summarization Framework using EEG and Eye-tracking
Signals
- Authors: Sai Sukruth Bezugam, Swatilekha Majumdar, Chetan Ralekar and Tapan
Kumar Gandhi
- Abstract summary: This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims.
To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology.
Using our approach, a video is summarized by 96.5% while maintaining higher precision and high recall factors.
- Score: 0.92246583941469
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes an efficient video summarization framework that will give
a gist of the entire video in a few key-frames or video skims. Existing video
summarization frameworks are based on algorithms that utilize computer vision
low-level feature extraction or high-level domain level extraction. However,
being the ultimate user of the summarized video, humans remain the most
neglected aspect. Therefore, the proposed paper considers human's role in
summarization and introduces human visual attention-based summarization
techniques. To understand human attention behavior, we have designed and
performed experiments with human participants using electroencephalogram (EEG)
and eye-tracking technology. The EEG and eye-tracking data obtained from the
experimentation are processed simultaneously and used to segment frames
containing useful information from a considerable video volume. Thus, the frame
segmentation primarily relies on the cognitive judgments of human beings. Using
our approach, a video is summarized by 96.5% while maintaining higher precision
and high recall factors. The comparison with the state-of-the-art techniques
demonstrates that the proposed approach yields ceiling-level performance with
reduced computational cost in summarising the videos.
Related papers
- Enhancing Video Summarization with Context Awareness [9.861215740353247]
Video summarization automatically generate concise summaries by selecting techniques, shots, or segments that capture the video's essence.
Despite the importance of video summarization, there is a lack of diverse and representative datasets.
We propose an unsupervised approach that leverages video data structure and information for generating informative summaries.
arXiv Detail & Related papers (2024-04-06T09:08:34Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - Deep Unsupervised Key Frame Extraction for Efficient Video
Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC)
The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically.
Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z) - Part-level Action Parsing via a Pose-guided Coarse-to-Fine Framework [108.70949305791201]
Part-level Action Parsing (PAP) aims to not only predict the video-level action but also recognize the frame-level fine-grained actions or interactions of body parts for each person in the video.
In particular, our framework first predicts the video-level class of the input video, then localizes the body parts and predicts the part-level action.
Our framework achieves state-of-the-art performance and outperforms existing methods over a 31.10% ROC score.
arXiv Detail & Related papers (2022-03-09T01:30:57Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [62.265410865423]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Use of Affective Visual Information for Summarization of Human-Centric
Videos [13.273989782771556]
We investigate the affective-information enriched supervised video summarization task for human-centric videos.
First, we train a visual input-driven state-of-the-art continuous emotion recognition model (CER-NET) on the RECOLA dataset to estimate emotional attributes.
Then, we integrate the estimated emotional attributes and the high-level representations from the CER-NET with the visual information to define the proposed affective video summarization architectures (AVSUM)
arXiv Detail & Related papers (2021-07-08T11:46:04Z) - Unsupervised Video Summarization with a Convolutional Attentive
Adversarial Network [32.90753137435032]
We propose a convolutional attentive adversarial network (CAAN) to build a deep summarizer in an unsupervised way.
Specifically, the generator employs a fully convolutional sequence network to extract global representation of a video, and an attention-based network to output normalized importance scores.
The results show the superiority of our proposed method against other state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-24T07:24:39Z) - Temporal Context Aggregation for Video Retrieval with Contrastive
Learning [81.12514007044456]
We propose TCA, a video representation learning framework that incorporates long-range temporal information between frame-level features.
The proposed method shows a significant performance advantage (17% mAP on FIVR-200K) over state-of-the-art methods with video-level features.
arXiv Detail & Related papers (2020-08-04T05:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.