Variational Autoencoder with Embedded Student-$t$ Mixture Model for
Authorship Attribution
- URL: http://arxiv.org/abs/2005.13930v1
- Date: Thu, 28 May 2020 11:52:32 GMT
- Title: Variational Autoencoder with Embedded Student-$t$ Mixture Model for
Authorship Attribution
- Authors: Benedikt Boenninghoff, Steffen Zeiler, Robert M. Nickel, Dorothea
Kolossa
- Abstract summary: Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts.
We propose a probabilistic autoencoding framework to deal with this supervised classification task.
Experiments over an Amazon review dataset indicate superior performance of the proposed method.
- Score: 13.196225569878761
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional computational authorship attribution describes a classification
task in a closed-set scenario. Given a finite set of candidate authors and
corresponding labeled texts, the objective is to determine which of the authors
has written another set of anonymous or disputed texts. In this work, we
propose a probabilistic autoencoding framework to deal with this supervised
classification task. More precisely, we are extending a variational autoencoder
(VAE) with embedded Gaussian mixture model to a Student-$t$ mixture model.
Autoencoders have had tremendous success in learning latent representations.
However, existing VAEs are currently still bound by limitations imposed by the
assumed Gaussianity of the underlying probability distributions in the latent
space. In this work, we are extending the Gaussian model for the VAE to a
Student-$t$ model, which allows for an independent control of the "heaviness"
of the respective tails of the implied probability densities. Experiments over
an Amazon review dataset indicate superior performance of the proposed method.
Related papers
- Disentangled Latent Spaces for Reduced Order Models using Deterministic Autoencoders [0.0]
More interpretability can be gained by disentangling the latent variables and analyzing the resulting modes.
Probabilistic autoencoders ($beta$-VAEs) are frequently used in computational fluid dynamics.
We show that competitive results can be achieved using non-probabilistic autoencoder approaches.
arXiv Detail & Related papers (2025-02-20T16:09:57Z) - Gaussian Mixture Vector Quantization with Aggregated Categorical Posterior [5.862123282894087]
We introduce the Vector Quantized Variational Autoencoder (VQ-VAE)
VQ-VAE is a type of variational autoencoder using discrete embedding as latent.
We show that GM-VQ improves codebook utilization and reduces information loss without relying on handcrafteds.
arXiv Detail & Related papers (2024-10-14T05:58:11Z) - SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning [49.94607673097326]
We propose a highly adaptable framework, designated as SimPro, which does not rely on any predefined assumptions about the distribution of unlabeled data.
Our framework, grounded in a probabilistic model, innovatively refines the expectation-maximization algorithm.
Our method showcases consistent state-of-the-art performance across diverse benchmarks and data distribution scenarios.
arXiv Detail & Related papers (2024-02-21T03:39:04Z) - Delta-AI: Local objectives for amortized inference in sparse graphical models [64.5938437823851]
We present a new algorithm for amortized inference in sparse probabilistic graphical models (PGMs)
Our approach is based on the observation that when the sampling of variables in a PGM is seen as a sequence of actions taken by an agent, sparsity of the PGM enables local credit assignment in the agent's policy learning objective.
We illustrate $Delta$-AI's effectiveness for sampling from synthetic PGMs and training latent variable models with sparse factor structure.
arXiv Detail & Related papers (2023-10-03T20:37:03Z) - Variational Diffusion Auto-encoder: Latent Space Extraction from
Pre-trained Diffusion Models [0.0]
Variational Auto-Encoders (VAEs) face challenges with the quality of generated images, often presenting noticeable blurriness.
This issue stems from the unrealistic assumption that approximates the conditional data distribution, $p(textbfx | textbfz)$, as an isotropic Gaussian.
We illustrate how one can extract a latent space from a pre-existing diffusion model by optimizing an encoder to maximize the marginal data log-likelihood.
arXiv Detail & Related papers (2023-04-24T14:44:47Z) - BRIO: Bringing Order to Abstractive Summarization [107.97378285293507]
We propose a novel training paradigm which assumes a non-deterministic distribution.
Our method achieves a new state-of-the-art result on the CNN/DailyMail (47.78 ROUGE-1) and XSum (49.07 ROUGE-1) datasets.
arXiv Detail & Related papers (2022-03-31T05:19:38Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Identification of Probability weighted ARX models with arbitrary domains [75.91002178647165]
PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system.
In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX)
The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field.
arXiv Detail & Related papers (2020-09-29T12:50:33Z) - Improve Variational Autoencoder for Text Generationwith Discrete Latent
Bottleneck [52.08901549360262]
Variational autoencoders (VAEs) are essential tools in end-to-end representation learning.
VAEs tend to ignore latent variables with a strong auto-regressive decoder.
We propose a principled approach to enforce an implicit latent feature matching in a more compact latent space.
arXiv Detail & Related papers (2020-04-22T14:41:37Z) - Regularized Autoencoders via Relaxed Injective Probability Flow [35.39933775720789]
Invertible flow-based generative models are an effective method for learning to generate samples, while allowing for tractable likelihood computation and inference.
We propose a generative model based on probability flows that does away with the bijectivity requirement on the model and only assumes injectivity.
arXiv Detail & Related papers (2020-02-20T18:22:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.