Feature Re-Learning with Data Augmentation for Video Relevance
Prediction
- URL: http://arxiv.org/abs/2004.03815v1
- Date: Wed, 8 Apr 2020 05:22:41 GMT
- Title: Feature Re-Learning with Data Augmentation for Video Relevance
Prediction
- Authors: Jianfeng Dong, Xun Wang, Leimin Zhang, Chaoxi Xu, Gang Yang, Xirong Li
- Abstract summary: Re-learning is realized by projecting a given deep feature into a new space by an affine transformation.
We propose a new data augmentation strategy which works directly on frame-level and video-level features.
- Score: 35.87597969685573
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Predicting the relevance between two given videos with respect to their
visual content is a key component for content-based video recommendation and
retrieval. Thanks to the increasing availability of pre-trained image and video
convolutional neural network models, deep visual features are widely used for
video content representation. However, as how two videos are relevant is
task-dependent, such off-the-shelf features are not always optimal for all
tasks. Moreover, due to varied concerns including copyright, privacy and
security, one might have access to only pre-computed video features rather than
original videos. We propose in this paper feature re-learning for improving
video relevance prediction, with no need of revisiting the original video
content. In particular, re-learning is realized by projecting a given deep
feature into a new space by an affine transformation. We optimize the
re-learning process by a novel negative-enhanced triplet ranking loss. In order
to generate more training data, we propose a new data augmentation strategy
which works directly on frame-level and video-level features. Extensive
experiments in the context of the Hulu Content-based Video Relevance Prediction
Challenge 2018 justify the effectiveness of the proposed method and its
state-of-the-art performance for content-based video relevance prediction.
Related papers
- EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval [52.375143786641196]
EgoCVR is an evaluation benchmark for fine-grained Composed Video Retrieval.
EgoCVR consists of 2,295 queries that specifically focus on high-quality temporal video understanding.
arXiv Detail & Related papers (2024-07-23T17:19:23Z) - Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story.
Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video.
A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z) - Retargeting video with an end-to-end framework [14.270721529264929]
We present an end-to-end RETVI method to retarget videos to arbitrary ratios.
Our system outperforms previous work in quality and running time.
arXiv Detail & Related papers (2023-11-08T04:56:41Z) - InternVideo: General Video Foundation Models via Generative and
Discriminative Learning [52.69422763715118]
We present general video foundation models, InternVideo, for dynamic and complex video-level understanding tasks.
InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives.
InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications.
arXiv Detail & Related papers (2022-12-06T18:09:49Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - Video Content Classification using Deep Learning [0.0]
This paper presents a model that is a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN)
The model can identify the type of video content and classify them into categories such as "Animation, Gaming, natural content, flat content, etc"
arXiv Detail & Related papers (2021-11-27T04:36:17Z) - VPN: Video Provenance Network for Robust Content Attribution [72.12494245048504]
We present VPN - a content attribution method for recovering provenance information from videos shared online.
We learn a robust search embedding for matching such video, using full-length or truncated video queries.
Once matched against a trusted database of video clips, associated information on the provenance of the clip is presented to the user.
arXiv Detail & Related papers (2021-09-21T09:07:05Z) - Few-Shot Video Object Detection [70.43402912344327]
We introduce Few-Shot Video Object Detection (FSVOD) with three important contributions.
FSVOD-500 comprises of 500 classes with class-balanced videos in each category for few-shot learning.
Our TPN and TMN+ are jointly and end-to-end trained.
arXiv Detail & Related papers (2021-04-30T07:38:04Z) - The complementarity of a diverse range of deep learning features
extracted from video content for video recommendation [2.092922495279074]
We explore the potential of various deep learning features to provide video recommendations.
Experiments on a real-world video dataset for movie recommendations show that deep learning features outperform hand-crafted features.
In particular, recommendations generated with deep learning audio features and action-centric deep learning features are superior to MFCC and state-of-the-art iDT features.
arXiv Detail & Related papers (2020-11-21T18:00:28Z) - Straight to the Point: Fast-forwarding Videos via Reinforcement Learning
Using Textual Data [1.004766879203303]
We present a novel methodology based on a reinforcement learning formulation to accelerate instructional videos.
Our approach can adaptively select frames that are not relevant to convey the information without creating gaps in the final video.
We propose a novel network, called Visually-guided Document Attention Network (VDAN), able to generate a highly discriminative embedding space.
arXiv Detail & Related papers (2020-03-31T14:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.