The complementarity of a diverse range of deep learning features
extracted from video content for video recommendation
- URL: http://arxiv.org/abs/2011.10834v2
- Date: Sat, 1 Jan 2022 00:46:04 GMT
- Title: The complementarity of a diverse range of deep learning features
extracted from video content for video recommendation
- Authors: Adolfo Almeida, Johan Pieter de Villiers, Allan De Freitas, Mergandran
Velayudan
- Abstract summary: We explore the potential of various deep learning features to provide video recommendations.
Experiments on a real-world video dataset for movie recommendations show that deep learning features outperform hand-crafted features.
In particular, recommendations generated with deep learning audio features and action-centric deep learning features are superior to MFCC and state-of-the-art iDT features.
- Score: 2.092922495279074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Following the popularisation of media streaming, a number of video streaming
services are continuously buying new video content to mine the potential profit
from them. As such, the newly added content has to be handled well to be
recommended to suitable users. In this paper, we address the new item
cold-start problem by exploring the potential of various deep learning features
to provide video recommendations. The deep learning features investigated
include features that capture the visual-appearance, audio and motion
information from video content. We also explore different fusion methods to
evaluate how well these feature modalities can be combined to fully exploit the
complementary information captured by them. Experiments on a real-world video
dataset for movie recommendations show that deep learning features outperform
hand-crafted features. In particular, recommendations generated with deep
learning audio features and action-centric deep learning features are superior
to MFCC and state-of-the-art iDT features. In addition, the combination of
various deep learning features with hand-crafted features and textual metadata
yields significant improvement in recommendations compared to combining only
the former.
Related papers
- EVC-MF: End-to-end Video Captioning Network with Multi-scale Features [13.85795110061781]
We propose an end-to-end encoder-decoder-based network (EVC-MF) for video captioning.
It efficiently utilizes multi-scale visual and textual features to generate video descriptions.
The results demonstrate that EVC-MF yields competitive performance compared with the state-of-theart methods.
arXiv Detail & Related papers (2024-10-22T02:16:02Z) - Realizing Video Summarization from the Path of Language-based Semantic Understanding [19.825666473712197]
We propose a novel video summarization framework inspired by the Mixture of Experts (MoE) paradigm.
Our approach integrates multiple VideoLLMs to generate comprehensive and coherent textual summaries.
arXiv Detail & Related papers (2024-10-06T15:03:22Z) - Multimodal Language Models for Domain-Specific Procedural Video Summarization [0.0]
We study the use of multimodal models to enhance video summarization and step-by-step instruction generation within specific domains.
Our approach focuses on fine-tuning TimeChat to improve its performance in specific domains: cooking and medical procedures.
Our findings indicate that when finetuned on domain-specific procedural data, TimeChat can significantly improve the extraction and summarization of key instructional steps in long-format videos.
arXiv Detail & Related papers (2024-07-07T15:50:46Z) - Video Infringement Detection via Feature Disentanglement and Mutual
Information Maximization [51.206398602941405]
We propose to disentangle an original high-dimensional feature into multiple sub-features.
On top of the disentangled sub-features, we learn an auxiliary feature to enhance the sub-features.
Our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset.
arXiv Detail & Related papers (2023-09-13T10:53:12Z) - Self-Supervised Learning for Videos: A Survey [70.37277191524755]
Self-supervised learning has shown promise in both image and video domains.
In this survey, we provide a review of existing approaches on self-supervised learning focusing on the video domain.
arXiv Detail & Related papers (2022-06-18T00:26:52Z) - Distilling Audio-Visual Knowledge by Compositional Contrastive Learning [51.20935362463473]
We learn a compositional embedding that closes the cross-modal semantic gap.
We establish a new, comprehensive multi-modal distillation benchmark on three video datasets.
arXiv Detail & Related papers (2021-04-22T09:31:20Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z) - A Clustering-Based Method for Automatic Educational Video Recommendation
Using Deep Face-Features of Lecturers [0.0]
This paper presents a method for generating educational video recommendation using deep face-features of lecturers without identifying them.
We use an unsupervised face clustering mechanism to create relations among the videos based on the lecturer's presence.
We rank these recommended videos based on the amount of time the referenced lecturers were present.
arXiv Detail & Related papers (2020-10-09T16:53:16Z) - Hybrid Dynamic-static Context-aware Attention Network for Action
Assessment in Long Videos [96.45804577283563]
We present a novel hybrid dynAmic-static Context-aware attenTION NETwork (ACTION-NET) for action assessment in long videos.
We learn the video dynamic information but also focus on the static postures of the detected athletes in specific frames.
We combine the features of the two streams to regress the final video score, supervised by ground-truth scores given by experts.
arXiv Detail & Related papers (2020-08-13T15:51:42Z) - Feature Re-Learning with Data Augmentation for Video Relevance
Prediction [35.87597969685573]
Re-learning is realized by projecting a given deep feature into a new space by an affine transformation.
We propose a new data augmentation strategy which works directly on frame-level and video-level features.
arXiv Detail & Related papers (2020-04-08T05:22:41Z) - Non-Adversarial Video Synthesis with Learned Priors [53.26777815740381]
We focus on the problem of generating videos from latent noise vectors, without any reference input frames.
We develop a novel approach that jointly optimize the input latent space, the weights of a recurrent neural network and a generator through non-adversarial learning.
Our approach generates superior quality videos compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2020-03-21T02:57:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.