Learning Pixel-Level Distinctions for Video Highlight Detection
- URL: http://arxiv.org/abs/2204.04615v1
- Date: Sun, 10 Apr 2022 06:41:16 GMT
- Title: Learning Pixel-Level Distinctions for Video Highlight Detection
- Authors: Fanyue Wei, Biao Wang, Tiezheng Ge, Yuning Jiang, Wen Li, Lixin Duan
- Abstract summary: We propose to learn pixel-level distinctions to improve the video highlight detection.
This pixel-level distinction indicates whether or not each pixel in one video belongs to an interesting section.
We design an encoder-decoder network to estimate the pixel-level distinction.
- Score: 39.23271866827123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of video highlight detection is to select the most attractive
segments from a long video to depict the most interesting parts of the video.
Existing methods typically focus on modeling relationship between different
video segments in order to learning a model that can assign highlight scores to
these segments; however, these approaches do not explicitly consider the
contextual dependency within individual segments. To this end, we propose to
learn pixel-level distinctions to improve the video highlight detection. This
pixel-level distinction indicates whether or not each pixel in one video
belongs to an interesting section. The advantages of modeling such fine-level
distinctions are two-fold. First, it allows us to exploit the temporal and
spatial relations of the content in one video, since the distinction of a pixel
in one frame is highly dependent on both the content before this frame and the
content around this pixel in this frame. Second, learning the pixel-level
distinction also gives a good explanation to the video highlight task regarding
what contents in a highlight segment will be attractive to people. We design an
encoder-decoder network to estimate the pixel-level distinction, in which we
leverage the 3D convolutional neural networks to exploit the temporal context
information, and further take advantage of the visual saliency to model the
spatial distinction. State-of-the-art performance on three public benchmarks
clearly validates the effectiveness of our framework for video highlight
detection.
Related papers
- Learning Fine-Grained Features for Pixel-wise Video Correspondences [13.456993858078514]
We address the problem of learning features for establishing pixel-wise correspondences.
Motivated by optical flows as well as the self-supervised feature learning, we propose to use not only labeled synthetic videos but also unlabeled real-world videos.
Our experimental results on a series of correspondence-based tasks demonstrate that the proposed method outperforms state-of-the-art rivals in both accuracy and efficiency.
arXiv Detail & Related papers (2023-08-06T07:27:17Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Cross-category Video Highlight Detection via Set-based Learning [55.49267044910344]
We propose a Dual-Learner-based Video Highlight Detection (DL-VHD) framework.
It learns the distinction of target category videos and the characteristics of highlight moments on source video category.
It outperforms five typical Unsupervised Domain Adaptation (UDA) algorithms on various cross-category highlight detection tasks.
arXiv Detail & Related papers (2021-08-26T13:06:47Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Semi-Supervised Action Recognition with Temporal Contrastive Learning [50.08957096801457]
We learn a two-pathway temporal contrastive model using unlabeled videos at two different speeds.
We considerably outperform video extensions of sophisticated state-of-the-art semi-supervised image recognition methods.
arXiv Detail & Related papers (2021-02-04T17:28:35Z) - A Multi-modal Deep Learning Model for Video Thumbnail Selection [0.0]
A good thumbnail should be a frame that best represents the content of a video while at the same time capturing viewers' attention.
In this paper, we expand the definition of content to include title, description, and audio of a video and utilize information provided by these modalities in our selection model.
To the best of our knowledge, we are the first to propose a multi-modal deep learning model to select video thumbnail, which beats the result from the previous State-of-The-Art models.
arXiv Detail & Related papers (2020-12-31T21:10:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.