A gaze driven fast-forward method for first-person videos
- URL: http://arxiv.org/abs/2006.05569v1
- Date: Wed, 10 Jun 2020 00:08:42 GMT
- Title: A gaze driven fast-forward method for first-person videos
- Authors: Alan Carvalho Neves, Michel Melo Silva, Mario Fernando Montenegro
Campos, Erickson Rangel Nascimento
- Abstract summary: We address the problem of accessing relevant information in First-Person Videos by creating an accelerated version of the input video and emphasizing the important moments to the recorder.
Our method is based on an attention model driven by gaze and visual scene analysis that provides a semantic score of each frame of the input video.
- Score: 2.362412515574206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing data sharing and life-logging cultures are driving an
unprecedented increase in the amount of unedited First-Person Videos. In this
paper, we address the problem of accessing relevant information in First-Person
Videos by creating an accelerated version of the input video and emphasizing
the important moments to the recorder. Our method is based on an attention
model driven by gaze and visual scene analysis that provides a semantic score
of each frame of the input video. We performed several experimental evaluations
on publicly available First-Person Videos datasets. The results show that our
methodology can fast-forward videos emphasizing moments when the recorder
visually interact with scene components while not including monotonous clips.
Related papers
- Personalized Video Summarization by Multimodal Video Understanding [2.1372652192505703]
We present a pipeline called Video Summarization with Language (VSL) for user-preferred video summarization.
VSL is based on pre-trained visual language models (VLMs) to avoid the need to train a video summarization system on a large training dataset.
We show that our method is more adaptable across different datasets compared to supervised query-based video summarization models.
arXiv Detail & Related papers (2024-11-05T22:14:35Z) - Unsupervised Video Highlight Detection by Learning from Audio and Visual Recurrence [13.2968942989609]
We focus on unsupervised video highlight detection, eliminating the need for manual annotations.
Through a clustering technique, we identify pseudo-categories of videos and compute audio pseudo-highlight scores for each video.
We also compute visual pseudo-highlight scores for each video using visual features.
arXiv Detail & Related papers (2024-07-18T23:09:14Z) - VideoCutLER: Surprisingly Simple Unsupervised Video Instance
Segmentation [87.13210748484217]
VideoCutLER is a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos.
We show the first competitive unsupervised learning results on the challenging YouTubeVIS 2019 benchmark, achieving 50.7% APvideo50.
VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS 2019 in terms of APvideo.
arXiv Detail & Related papers (2023-08-28T17:10:12Z) - Causal Video Summarizer for Video Exploration [74.27487067877047]
Causal Video Summarizer (CVS) is proposed to capture the interactive information between the video and query.
Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective.
arXiv Detail & Related papers (2023-07-04T22:52:16Z) - Few-shot Action Recognition via Intra- and Inter-Video Information
Maximization [28.31541961943443]
We propose a novel framework, Video Information Maximization (VIM), for few-shot action recognition.
VIM is equipped with an adaptive spatial-temporal video sampler and atemporal action alignment model.
VIM acts to maximize the distinctiveness of video information from limited video data.
arXiv Detail & Related papers (2023-05-10T13:05:43Z) - Weakly-Supervised Action Detection Guided by Audio Narration [50.4318060593995]
We propose a model to learn from the narration supervision and utilize multimodal features, including RGB, motion flow, and ambient sound.
Our experiments show that noisy audio narration suffices to learn a good action detection model, thus reducing annotation expenses.
arXiv Detail & Related papers (2022-05-12T06:33:24Z) - QVHighlights: Detecting Moments and Highlights in Videos via Natural
Language Queries [89.24431389933703]
We present the Query-based Video Highlights (QVHighlights) dataset.
It consists of over 10,000 YouTube videos, covering a wide range of topics.
Each video in the dataset is annotated with: (1) a human-written free-form NL query, (2) relevant moments in the video w.r.t. the query, and (3) five-point scale saliency scores for all query-relevant clips.
arXiv Detail & Related papers (2021-07-20T16:42:58Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - Video Exploration via Video-Specific Autoencoders [60.256055890647595]
We present video-specific autoencoders that enables human-controllable video exploration.
We observe that a simple autoencoder trained on multiple frames of a specific video enables one to perform a large variety of video processing and editing tasks.
arXiv Detail & Related papers (2021-03-31T17:56:13Z) - A Sparse Sampling-based framework for Semantic Fast-Forward of
First-Person Videos [2.362412515574206]
Most uploaded videos are doomed to be forgotten and unwatched stashed away in some computer folder or website.
We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem.
Our method is able to retain as much relevant information and smoothness as the state-of-the-art techniques, but in less processing time.
arXiv Detail & Related papers (2020-09-21T18:36:17Z) - Straight to the Point: Fast-forwarding Videos via Reinforcement Learning
Using Textual Data [1.004766879203303]
We present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos.
Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video.
We propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space.
arXiv Detail & Related papers (2020-03-31T14:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.