An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings
- URL: http://arxiv.org/abs/2504.10004v1
- Date: Mon, 14 Apr 2025 09:07:11 GMT
- Title: An Image is Worth $K$ Topics: A Visual Structural Topic Model with Pretrained Image Embeddings
- Authors: Matías Piqueras, Alexandra Segerberg, Matteo Magnani, Måns Magnusson, Nataša Sladoje,
- Abstract summary: We introduce a visual Structural Topic Model (vSTM) that combines pretrained image embeddings with a structural topic model.<n>We show that the vSTM is able to identify topics that are interpretable, coherent, and substantively relevant to the study of online political communication.
- Score: 40.701557052493214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Political scientists are increasingly interested in analyzing visual content at scale. However, the existing computational toolbox is still in need of methods and models attuned to the specific challenges and goals of social and political inquiry. In this article, we introduce a visual Structural Topic Model (vSTM) that combines pretrained image embeddings with a structural topic model. This has important advantages compared to existing approaches. First, pretrained embeddings allow the model to capture the semantic complexity of images relevant to political contexts. Second, the structural topic model provides the ability to analyze how topics and covariates are related, while maintaining a nuanced representation of images as a mixture of multiple topics. In our empirical application, we show that the vSTM is able to identify topics that are interpretable, coherent, and substantively relevant to the study of online political communication.
Related papers
- Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - GINopic: Topic Modeling with Graph Isomorphism Network [0.8962460460173959]
We introduce GINopic, a topic modeling framework based on graph isomorphism networks to capture the correlation between words.<n>We demonstrate the effectiveness of GINopic compared to existing topic models and highlight its potential for advancing topic modeling.
arXiv Detail & Related papers (2024-04-02T17:18:48Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog [12.034554338597067]
We propose a novel model by Reasoning with Multi-structure Commonsense Knowledge (RMK)
In our model, the external knowledge is represented with sentence-level facts and graph-level facts.
On top of these multi-structure representations, our model can capture relevant knowledge and incorporate them into the vision and semantic features.
arXiv Detail & Related papers (2022-04-10T13:12:10Z) - Query-Driven Topic Model [23.07260625816975]
One desirable property of topic models is to allow users to find topics describing a specific aspect of the corpus.
We propose a novel query-driven topic model that allows users to specify a simple query in words or phrases and return query-related topics.
arXiv Detail & Related papers (2021-05-28T22:49:42Z) - Matching Visual Features to Hierarchical Semantic Topics for Image
Paragraph Captioning [50.08729005865331]
This paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework.
To capture the correlations between the image and text at multiple levels of abstraction, we design a variational inference network.
To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model.
arXiv Detail & Related papers (2021-05-10T06:55:39Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.