A Sparse Sampling-based framework for Semantic Fast-Forward of
First-Person Videos
- URL: http://arxiv.org/abs/2009.11063v1
- Date: Mon, 21 Sep 2020 18:36:17 GMT
- Title: A Sparse Sampling-based framework for Semantic Fast-Forward of
First-Person Videos
- Authors: Michel Melo Silva, Washington Luis Souza Ramos, Mario Fernando
Montenegro Campos, Erickson Rangel Nascimento
- Abstract summary: Most uploaded videos are doomed to be forgotten and unwatched stashed away in some computer folder or website.
We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem.
Our method is able to retain as much relevant information and smoothness as the state-of-the-art techniques, but in less processing time.
- Score: 2.362412515574206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Technological advances in sensors have paved the way for digital cameras to
become increasingly ubiquitous, which, in turn, led to the popularity of the
self-recording culture. As a result, the amount of visual data on the Internet
is moving in the opposite direction of the available time and patience of the
users. Thus, most of the uploaded videos are doomed to be forgotten and
unwatched stashed away in some computer folder or website. In this paper, we
address the problem of creating smooth fast-forward videos without losing the
relevant content. We present a new adaptive frame selection formulated as a
weighted minimum reconstruction problem. Using a smoothing frame transition and
filling visual gaps between segments, our approach accelerates first-person
videos emphasizing the relevant segments and avoids visual discontinuities.
Experiments conducted on controlled videos and also on an unconstrained dataset
of First-Person Videos (FPVs) show that, when creating fast-forward videos, our
method is able to retain as much relevant information and smoothness as the
state-of-the-art techniques, but in less processing time.
Related papers
- Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - VidToMe: Video Token Merging for Zero-Shot Video Editing [100.79999871424931]
We propose a novel approach to enhance temporal consistency in generated videos by merging self-attention tokens across frames.
Our method improves temporal coherence and reduces memory consumption in self-attention computations.
arXiv Detail & Related papers (2023-12-17T09:05:56Z) - A Simple Recipe for Contrastively Pre-training Video-First Encoders
Beyond 16 Frames [54.90226700939778]
We build on the common paradigm of transferring large-scale, image--text models to video via shallow temporal fusion.
We expose two limitations to the approach: (1) decreased spatial capabilities, likely due to poor video--language alignment in standard video datasets, and (2) higher memory consumption, bottlenecking the number of frames that can be processed.
arXiv Detail & Related papers (2023-12-12T16:10:19Z) - Retargeting video with an end-to-end framework [14.270721529264929]
We present an end-to-end RETVI method to retarget videos to arbitrary ratios.
Our system outperforms previous work in quality and running time.
arXiv Detail & Related papers (2023-11-08T04:56:41Z) - Blurry Video Compression: A Trade-off between Visual Enhancement and
Data Compression [65.8148169700705]
Existing video compression (VC) methods primarily aim to reduce the spatial and temporal redundancies between consecutive frames in a video.
Previous works have achieved remarkable results on videos acquired under specific settings such as instant (known) exposure time and shutter speed.
In this work, we tackle the VC problem in a general scenario where a given video can be blurry due to predefined camera settings or dynamics in the scene.
arXiv Detail & Related papers (2023-11-08T02:17:54Z) - You Can Ground Earlier than See: An Effective and Efficient Pipeline for
Temporal Sentence Grounding in Compressed Videos [56.676761067861236]
Given an untrimmed video, temporal sentence grounding aims to locate a target moment semantically according to a sentence query.
Previous respectable works have made decent success, but they only focus on high-level visual features extracted from decoded frames.
We propose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input.
arXiv Detail & Related papers (2023-03-14T12:53:27Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - A gaze driven fast-forward method for first-person videos [2.362412515574206]
We address the problem of accessing relevant information in First-Person Videos by creating an accelerated version of the input video and emphasizing the important moments to the recorder.
Our method is based on an attention model driven by gaze and visual scene analysis that provides a semantic score of each frame of the input video.
arXiv Detail & Related papers (2020-06-10T00:08:42Z) - Straight to the Point: Fast-forwarding Videos via Reinforcement Learning
Using Textual Data [1.004766879203303]
We present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos.
Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video.
We propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space.
arXiv Detail & Related papers (2020-03-31T14:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.