Related papers: Smoothed Gaussian Mixture Models for Video Classification and Recommendation

Smoothed Gaussian Mixture Models for Video Classification and Recommendation

URL: http://arxiv.org/abs/2012.11673v1
Date: Thu, 17 Dec 2020 06:52:41 GMT
Title: Smoothed Gaussian Mixture Models for Video Classification and Recommendation
Authors: Sirjan Kafle, Aman Gupta, Xue Xia, Ananth Sankar, Xi Chen, Di Wen, Liang Zhang
Abstract summary: We propose a new cluster-and-aggregate method which we call smoothed Gaussian mixture model (SGMM) We show, through extensive experiments on the YouTube-8M classification task, that SGMM/DSGMM is consistently better than VLAD/NetVLAD by a small but statistically significant margin.
Score: 10.119117405418868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cluster-and-aggregate techniques such as Vector of Locally Aggregated Descriptors (VLAD), and their end-to-end discriminatively trained equivalents like NetVLAD have recently been popular for video classification and action recognition tasks. These techniques operate by assigning video frames to clusters and then representing the video by aggregating residuals of frames with respect to the mean of each cluster. Since some clusters may see very little video-specific data, these features can be noisy. In this paper, we propose a new cluster-and-aggregate method which we call smoothed Gaussian mixture model (SGMM), and its end-to-end discriminatively trained equivalent, which we call deep smoothed Gaussian mixture model (DSGMM). SGMM represents each video by the parameters of a Gaussian mixture model (GMM) trained for that video. Low-count clusters are addressed by smoothing the video-specific estimates with a universal background model (UBM) trained on a large number of videos. The primary benefit of SGMM over VLAD is smoothing which makes it less sensitive to small number of training samples. We show, through extensive experiments on the YouTube-8M classification task, that SGMM/DSGMM is consistently better than VLAD/NetVLAD by a small but statistically significant margin. We also show results using a dataset created at LinkedIn to predict if a member will watch an uploaded video.

Related papers

Performance of Gaussian Mixture Model Classifiers on Embedded Feature Spaces [1.3241991482253108]
Data embeddings with CLIP and ImageBind provide powerful features for the analysis of multimedia and/or multimodal data. We assess their performance here for classification using a Gaussian Mixture models (GMMs) based layer as an alternative to the standard Softmax layer. Our findings are, that in most cases, one gaussian component in the GMMs is often enough for capturing each class, and we hypothesize that this may be due to the contrastive loss used for training these embedded spaces.
arXiv Detail & Related papers (2024-10-17T10:43:43Z)
SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method. We distribute features of space-time tubes evenly across a limited number of learnable clusters. Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z)
GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval [59.47258928867802]
Given a text query, partially relevant video retrieval (PRVR) seeks to find videos containing pertinent moments in a database. This paper proposes GMMFormer, a Gaussian-Mixture-Model based Transformer which models clip representations implicitly. Experiments on three large-scale video datasets demonstrate the superiority and efficiency of GMMFormer.
arXiv Detail & Related papers (2023-10-08T15:04:50Z)
GMMSeg: Gaussian Mixture based Generative Semantic Segmentation Models [74.0430727476634]
We propose a new family of segmentation models that rely on a dense generative classifier for the joint distribution p(pixel feature,class) With a variety of segmentation architectures and backbones, GMMSeg outperforms the discriminative counterparts on closed-set datasets. GMMSeg even performs well on open-world datasets.
arXiv Detail & Related papers (2022-10-05T05:20:49Z)
Bayesian Nonparametric Submodular Video Partition for Robust Anomaly Detection [9.145168943972067]
Multiple-instance learning (MIL) provides an effective way to tackle the video anomaly detection problem. We propose to conduct novel Bayesian non-parametric submodular video partition (BN-SVP) to significantly improve MIL model training. Our theoretical analysis ensures a strong performance guarantee of the proposed algorithm.
arXiv Detail & Related papers (2022-03-24T04:00:49Z)
A new perspective on probabilistic image modeling [92.89846887298852]
We present a new probabilistic approach for image modeling capable of density estimation, sampling and tractable inference. DCGMMs can be trained end-to-end by SGD from random initial conditions, much like CNNs. We show that DCGMMs compare favorably to several recent PC and SPN models in terms of inference, classification and sampling.
arXiv Detail & Related papers (2022-03-21T14:53:57Z)
Image Modeling with Deep Convolutional Gaussian Mixture Models [79.0660895390689]
We present a new formulation of deep hierarchical Gaussian Mixture Models (GMMs) that is suitable for describing and generating images. DCGMMs avoid this by a stacked architecture of multiple GMM layers, linked by convolution and pooling operations. For generating sharp images with DCGMMs, we introduce a new gradient-based technique for sampling through non-invertible operations like convolution and pooling. Based on the MNIST and FashionMNIST datasets, we validate the DCGMMs model by demonstrating its superiority over flat GMMs for clustering, sampling and outlier detection.
arXiv Detail & Related papers (2021-04-19T12:08:53Z)
EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering [22.586481334904793]
We propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions. The parameters in EGMM are estimated by a specially designed Expectation-Maximization (EM) algorithm. The proposed EGMM is as simple as the classical GMM, but can generate a more informative evidential partition for the considered dataset.
arXiv Detail & Related papers (2020-10-03T11:59:07Z)
Semi-Supervised Learning with Normalizing Flows [54.376602201489995]
FlowGMM is an end-to-end approach to generative semi supervised learning with normalizing flows. We show promising results on a wide range of applications, including AG-News and Yahoo Answers text data.
arXiv Detail & Related papers (2019-12-30T17:36:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.