Related papers: Feature Re-Learning with Data Augmentation for Video Relevance Prediction

Feature Re-Learning with Data Augmentation for Video Relevance Prediction

URL: http://arxiv.org/abs/2004.03815v1
Date: Wed, 8 Apr 2020 05:22:41 GMT
Title: Feature Re-Learning with Data Augmentation for Video Relevance Prediction
Authors: Jianfeng Dong, Xun Wang, Leimin Zhang, Chaoxi Xu, Gang Yang, Xirong Li
Abstract summary: Re-learning is realized by projecting a given deep feature into a new space by an affine transformation. We propose a new data augmentation strategy which works directly on frame-level and video-level features.
Score: 35.87597969685573
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Predicting the relevance between two given videos with respect to their visual content is a key component for content-based video recommendation and retrieval. Thanks to the increasing availability of pre-trained image and video convolutional neural network models, deep visual features are widely used for video content representation. However, as how two videos are relevant is task-dependent, such off-the-shelf features are not always optimal for all tasks. Moreover, due to varied concerns including copyright, privacy and security, one might have access to only pre-computed video features rather than original videos. We propose in this paper feature re-learning for improving video relevance prediction, with no need of revisiting the original video content. In particular, re-learning is realized by projecting a given deep feature into a new space by an affine transformation. We optimize the re-learning process by a novel negative-enhanced triplet ranking loss. In order to generate more training data, we propose a new data augmentation strategy which works directly on frame-level and video-level features. Extensive experiments in the context of the Hulu Content-based Video Relevance Prediction Challenge 2018 justify the effectiveness of the proposed method and its state-of-the-art performance for content-based video relevance prediction.

Related papers

Agent-based Video Trimming [17.519404251018308]
We introduce a novel task called Video Trimming (VT) VT focuses on detecting wasted footage, selecting valuable segments, and composing them into a final video with a coherent story. AVT received more favorable evaluations in user studies and demonstrated superior mAP and precision on the YouTube Highlights, TVSum, and our own dataset for the highlight detection task.
arXiv Detail & Related papers (2024-12-12T17:59:28Z)
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning [71.94122309290537]
We propose an efficient, online approach to generate dense captions for videos. Our model uses a novel autoregressive factorized decoding architecture. Our approach shows excellent performance compared to both offline and online methods, and uses 20% less compute.
arXiv Detail & Related papers (2024-11-22T02:46:44Z)
Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story. Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video. A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z)
Retargeting video with an end-to-end framework [14.270721529264929]
We present an end-to-end RETVI method to retarget videos to arbitrary ratios. Our system outperforms previous work in quality and running time.
arXiv Detail & Related papers (2023-11-08T04:56:41Z)
InternVideo: General Video Foundation Models via Generative and Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks. InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives. InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z)
Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency. Our method is only trained on a pair of original and processed videos directly instead of a large dataset. We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z)
Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos. We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries. An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z)
Video Content Classification using Deep Learning [0.0]
This paper presents a model that is a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) The model can identify the type of video content and classify them into categories such as "Animation, Gaming, natural content, flat content, etc"
arXiv Detail & Related papers (2021-11-27T04:36:17Z)
VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online. We learn a robust search embedding for matching such video, using full-length or truncated video queries. Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z)
Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions. FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning. Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z)
The complementarity of a diverse range of deep learning features extracted from video content for video recommendation [2.092922495279074]
We explore the potential of various deep learning features to provide video recommendations. Experiments on a real-world video dataset for movie recommendations show that deep learning features outperform hand-crafted features. In particular, recommendations generated with deep learning audio features and action-centric deep learning features are superior to MFCC and state-of-the-art iDT features.
arXiv Detail & Related papers (2020-11-21T18:00:28Z)
Straight to the Point: Fast-forwarding Videos via Reinforcement Learning Using Textual Data [1.004766879203303]
We present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos. Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video. We propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space.
arXiv Detail & Related papers (2020-03-31T14:07:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.