Multimodal Topic Learning for Video Recommendation
- URL: http://arxiv.org/abs/2010.13373v1
- Date: Mon, 26 Oct 2020 07:02:47 GMT
- Title: Multimodal Topic Learning for Video Recommendation
- Authors: Shi Pu and Yijiang He and Zheng Li and Mao Zheng
- Abstract summary: We propose a multimodal topic learning algorithm for generating video topics offline.
The topics generated serve as semantic topic features to facilitate preference scope determination and recommendation generation.
Our proposed algorithm has been deployed in the Kuaibao information streaming platform.
- Score: 5.458980400688099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Facilitated by deep neural networks, video recommendation systems have made
significant advances. Existing video recommendation systems directly exploit
features from different modalities (e.g., user personal data, user behavior
data, video titles, video tags, and visual contents) to input deep neural
networks, while expecting the networks to online mine user-preferred topics
implicitly from these features. However, the features lacking semantic topic
information limits accurate recommendation generation. In addition, feature
crosses using visual content features generate high dimensionality features
that heavily downgrade the online computational efficiency of networks. In this
paper, we explicitly separate topic generation from recommendation generation,
propose a multimodal topic learning algorithm to exploit three modalities
(i.e., tags, titles, and cover images) for generating video topics offline. The
topics generated by the proposed algorithm serve as semantic topic features to
facilitate preference scope determination and recommendation generation.
Furthermore, we use the semantic topic features instead of visual content
features to effectively reduce online computational cost. Our proposed
algorithm has been deployed in the Kuaibao information streaming platform.
Online and offline evaluation results show that our proposed algorithm performs
favorably.
Related papers
- You Need to Read Again: Multi-granularity Perception Network for Moment
Retrieval in Videos [19.711703590063976]
We propose a novel Multi-Granularity Perception Network (MGPN) that perceives intra-modality and inter-modality information at a multi-granularity level.
Specifically, we formulate moment retrieval as a multi-choice reading comprehension task and integrate human reading strategies into our framework.
arXiv Detail & Related papers (2022-05-25T16:15:46Z) - Video Summarization Based on Video-text Modelling [0.0]
We propose a multimodal self-supervised learning framework to obtain semantic representations of videos.
We also introduce a progressive video summarization method, where the important content in a video is pinpointed progressively to generate better summaries.
An objective evaluation framework is proposed to measure the quality of video summaries based on video classification.
arXiv Detail & Related papers (2022-01-07T15:21:46Z) - Video Content Classification using Deep Learning [0.0]
This paper presents a model that is a combination of Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN)
The model can identify the type of video content and classify them into categories such as "Animation, Gaming, natural content, flat content, etc"
arXiv Detail & Related papers (2021-11-27T04:36:17Z) - Efficient Modelling Across Time of Human Actions and Interactions [92.39082696657874]
We argue that current fixed-sized-temporal kernels in 3 convolutional neural networks (CNNDs) can be improved to better deal with temporal variations in the input.
We study how we can better handle between classes of actions, by enhancing their feature differences over different layers of the architecture.
The proposed approaches are evaluated on several benchmark action recognition datasets and show competitive results.
arXiv Detail & Related papers (2021-10-05T15:39:11Z) - A Survey on Deep Learning Technique for Video Segmentation [147.0767454918527]
Video segmentation plays a critical role in a broad range of practical applications.
Deep learning based approaches have been dedicated to video segmentation and delivered compelling performance.
arXiv Detail & Related papers (2021-07-02T15:51:07Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Video Summarization Using Deep Neural Networks: A Survey [72.98424352264904]
Video summarization technologies aim to create a concise and complete synopsis by selecting the most informative parts of the video content.
This work focuses on the recent advances in the area and provides a comprehensive survey of the existing deep-learning-based methods for generic video summarization.
arXiv Detail & Related papers (2021-01-15T11:41:29Z) - Feature Re-Learning with Data Augmentation for Video Relevance
Prediction [35.87597969685573]
Re-learning is realized by projecting a given deep feature into a new space by an affine transformation.
We propose a new data augmentation strategy which works directly on frame-level and video-level features.
arXiv Detail & Related papers (2020-04-08T05:22:41Z) - Convolutional Hierarchical Attention Network for Query-Focused Video
Summarization [74.48782934264094]
This paper addresses the task of query-focused video summarization, which takes user's query and a long video as inputs.
We propose a method, named Convolutional Hierarchical Attention Network (CHAN), which consists of two parts: feature encoding network and query-relevance computing module.
In the encoding network, we employ a convolutional network with local self-attention mechanism and query-aware global attention mechanism to learns visual information of each shot.
arXiv Detail & Related papers (2020-01-31T04:30:14Z) - Video Captioning with Guidance of Multimodal Latent Topics [123.5255241103578]
We propose an unified caption framework, M&M TGM, which mines multimodal topics in unsupervised fashion from data.
Compared to pre-defined topics, the mined multimodal topics are more semantically and visually coherent.
The results from extensive experiments conducted on the MSR-VTT and Youtube2Text datasets demonstrate the effectiveness of our proposed model.
arXiv Detail & Related papers (2017-08-31T11:18:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.