A Bayesian approach to modeling topic-metadata relationships
- URL: http://arxiv.org/abs/2104.02496v2
- Date: Mon, 28 Apr 2025 07:49:29 GMT
- Title: A Bayesian approach to modeling topic-metadata relationships
- Authors: P. Schulze, S. Wiegrebe, P. W. Thurner, C. Heumann, M. Aßenmacher,
- Abstract summary: Methods used to estimate relationships between topics and metadata must take into account that the topical structure is not directly observed.<n>A frequently used procedure to achieve this is the method of composition performing multiple repeated linear regressions of sampled topic proportions.<n>We propose two modifications to this approach: First, we substantially refine the existing implementation of the method of composition from the R package stm by replacing linear regression with the more appropriate Beta regression.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The objective of advanced topic modeling is not only to explore latent topical structures, but also to estimate relationships between the discovered topics and theoretically relevant metadata. Methods used to estimate such relationships must take into account that the topical structure is not directly observed, but instead being estimated itself in an unsupervised fashion, usually by common topic models. A frequently used procedure to achieve this is the method of composition, a Monte Carlo sampling technique performing multiple repeated linear regressions of sampled topic proportions on metadata covariates. In this paper, we propose two modifications of this approach: First, we substantially refine the existing implementation of the method of composition from the R package stm by replacing linear regression with the more appropriate Beta regression. Second, we provide a fundamental enhancement of the entire estimation framework by substituting the current blending of frequentist and Bayesian methods with a fully Bayesian approach. This allows for a more appropriate quantification of uncertainty. We illustrate our improved methodology by investigating relationships between Twitter posts by German parliamentarians and different metadata covariates related to their electoral districts, using the Structural Topic Model to estimate topic proportions.
Related papers
- Amortized In-Context Bayesian Posterior Estimation [15.714462115687096]
Amortization, through conditional estimation, is a viable strategy to alleviate such difficulties.
We conduct a thorough comparative analysis of amortized in-context Bayesian posterior estimation methods.
We highlight the superiority of the reverse KL estimator for predictive problems, especially when combined with the transformer architecture and normalizing flows.
arXiv Detail & Related papers (2025-02-10T16:00:48Z) - Graph Topic Modeling for Documents with Spatial or Covariate Dependencies [0.9208007322096533]
We address the challenge of incorporating document-level metadata into topic modeling.
We propose a new estimator based on a fast graph-regularized iterative singular value decomposition.
We validate our model through comprehensive experiments on synthetic datasets and three real-world corpora.
arXiv Detail & Related papers (2024-12-19T03:00:26Z) - Weakly supervised covariance matrices alignment through Stiefel matrices
estimation for MEG applications [64.20396555814513]
This paper introduces a novel domain adaptation technique for time series data, called Mixing model Stiefel Adaptation (MSA)
We exploit abundant unlabeled data in the target domain to ensure effective prediction by establishing pairwise correspondence with equivalent signal variances between domains.
MSA outperforms recent methods in brain-age regression with task variations using magnetoencephalography (MEG) signals from the Cam-CAN dataset.
arXiv Detail & Related papers (2024-01-24T19:04:49Z) - A Bayesian Methodology for Estimation for Sparse Canonical Correlation [0.0]
Canonical Correlation Analysis (CCA) is a statistical procedure for identifying relationships between data sets.
ScSCCA is a rapidly emerging methodological area that aims for robust modeling of the interrelations between the different data modalities.
We propose a novel ScSCCA approach where we employ a Bayesian infinite factor model and aim to achieve robust estimation.
arXiv Detail & Related papers (2023-10-30T15:14:25Z) - Functional Generalized Canonical Correlation Analysis for studying
multiple longitudinal variables [0.9208007322096533]
Functional Generalized Canonical Correlation Analysis (FGCCA) is a new framework for exploring associations between multiple random processes observed jointly.
We establish the monotonic property of the solving procedure and introduce a Bayesian approach for estimating canonical components.
We present a use case on a longitudinal dataset and evaluate the method's efficiency in simulation studies.
arXiv Detail & Related papers (2023-10-11T09:21:31Z) - Monte Carlo inference for semiparametric Bayesian regression [5.488491124945426]
This paper introduces a simple, general, and efficient strategy for joint posterior inference of an unknown transformation and all regression model parameters.
It delivers (1) joint posterior consistency under general conditions, including multiple model misspecifications, and (2) efficient Monte Carlo (not Markov chain Monte Carlo) inference for the transformation and all parameters for important special cases.
arXiv Detail & Related papers (2023-06-08T18:42:42Z) - Are Neural Topic Models Broken? [81.15470302729638]
We study the relationship between automated and human evaluation of topic models.
We find that neural topic models fare worse in both respects compared to an established classical method.
arXiv Detail & Related papers (2022-10-28T14:38:50Z) - Mining Relations among Cross-Frame Affinities for Video Semantic
Segmentation [87.4854250338374]
We explore relations among affinities in two aspects: single-scale intrinsic correlations and multi-scale relations.
Our experiments demonstrate that the proposed method performs favorably against state-of-the-art VSS methods.
arXiv Detail & Related papers (2022-07-21T12:12:36Z) - Scalable Semi-Modular Inference with Variational Meta-Posteriors [0.0]
The Cut posterior and Semi-Modular Inference are Generalised Bayes methods for Modular Bayesian evidence combination.
We show that analysis of models with multiple cuts is feasible using a new Variational Meta-Posterior.
This approximates a family of SMI posteriors indexed by $eta$ using a single set of variational parameters.
arXiv Detail & Related papers (2022-04-01T09:01:20Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood
Inference from Sampled Trajectories [61.3299263929289]
Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice.
One class of methods uses data simulated with different parameters to infer an amortized estimator for the likelihood-to-evidence ratio.
We show that this approach can be formulated in terms of mutual information between model parameters and simulated data.
arXiv Detail & Related papers (2021-06-03T12:59:16Z) - TFPose: Direct Human Pose Estimation with Transformers [83.03424247905869]
We formulate the pose estimation task into a sequence prediction problem that can effectively be solved by transformers.
Our framework is simple and direct, bypassing the drawbacks of the heatmap-based pose estimation.
Experiments on the MS-COCO and MPII datasets demonstrate that our method can significantly improve the state-of-the-art of regression-based pose estimation.
arXiv Detail & Related papers (2021-03-29T04:18:54Z) - Hyperparameter Optimization with Differentiable Metafeatures [5.586191108738563]
We propose a cross dataset surrogate model called Differentiable Metafeature-based Surrogate (DMFBS)
In contrast to existing models, DMFBS i) integrates a differentiable metafeature extractor and ii) is optimized using a novel multi-task loss.
We compare DMFBS against several recent models for HPO on three large meta-datasets and show that it consistently outperforms all of them with an average 10% improvement.
arXiv Detail & Related papers (2021-02-07T11:06:31Z) - Improved Semantic Role Labeling using Parameterized Neighborhood Memory
Adaptation [22.064890647610348]
We propose a parameterized neighborhood memory adaptive (PNMA) method that uses a parameterized representation of the nearest neighbors of tokens in a memory of activations.
We empirically show that PNMA consistently improves the SRL performance of the base model irrespective of types of word embeddings.
arXiv Detail & Related papers (2020-11-29T22:51:25Z) - Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling [54.94763543386523]
Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the ( aggregate) posterior to encourage statistical independence of the latent factors.
We present a novel multi-stage modeling approach where the disentangled factors are first learned using a penalty-based disentangled representation learning method.
Then, the low-quality reconstruction is improved with another deep generative model that is trained to model the missing correlated latent variables.
arXiv Detail & Related papers (2020-10-25T18:51:15Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z) - Evaluating the Disentanglement of Deep Generative Models through
Manifold Topology [66.06153115971732]
We present a method for quantifying disentanglement that only uses the generative model.
We empirically evaluate several state-of-the-art models across multiple datasets.
arXiv Detail & Related papers (2020-06-05T20:54:11Z) - GPT-too: A language-model-first approach for AMR-to-text generation [22.65728041544785]
We propose an approach that combines a strong pre-trained language model with cycle consistency-based re-scoring.
Despite the simplicity of the approach, our experimental results show these models outperform all previous techniques.
arXiv Detail & Related papers (2020-05-18T22:50:26Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.