A Multiscale Geometric Method for Capturing Relational Topic Alignment
- URL: http://arxiv.org/abs/2511.21741v1
- Date: Fri, 21 Nov 2025 22:45:16 GMT
- Title: A Multiscale Geometric Method for Capturing Relational Topic Alignment
- Authors: Conrad D. Hougen, Karl T. Pazdernik, Alfred O. Hero,
- Abstract summary: Interpretable topic modeling is essential for tracking how research interests evolve within co-author communities.<n>We propose a method that integrates multimodal text and co-author network data, using Hellinger distances and Ward's linkage to construct a hierarchical topic dendrogram.<n>Our method effectively identifies rare-topic structure and visualizes smooth topic drift over time.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpretable topic modeling is essential for tracking how research interests evolve within co-author communities. In scientific corpora, where novelty is prized, identifying underrepresented niche topics is particularly important. However, contemporary models built from dense transformer embeddings tend to miss rare topics and therefore also fail to capture smooth temporal alignment. We propose a geometric method that integrates multimodal text and co-author network data, using Hellinger distances and Ward's linkage to construct a hierarchical topic dendrogram. This approach captures both local and global structure, supporting multiscale learning across semantic and temporal dimensions. Our method effectively identifies rare-topic structure and visualizes smooth topic drift over time. Experiments highlight the strength of interpretable bag-of-words models when paired with principled geometric alignment.
Related papers
- GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs [59.61242815508687]
Graph neural networks (GNNs) on text--attributed graphs (TAGs) encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation.<n>This work introduces a local PCA-based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure.
arXiv Detail & Related papers (2025-11-12T06:48:43Z) - Dynamic Topic Evolution with Temporal Decay and Attention in Large Language Models [3.4219049032524804]
This paper proposes a modeling framework for dynamic topic evolution based on temporal large language models.<n>The proposed method provides a systematic solution for understanding dynamic semantic patterns in large-scale text.
arXiv Detail & Related papers (2025-10-12T13:50:41Z) - Visualizing Temporal Topic Embeddings with a Compass [1.5184974790808403]
This paper proposes an expansion of the compass-aligned temporal Word2Vec methodology into dynamic topic modeling.
Such a method allows for the direct comparison of word and document embeddings across time in dynamic topics.
arXiv Detail & Related papers (2024-09-16T18:29:19Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics [1.854328133293073]
This paper presents an algorithmic family of dynamic topic models called Aligned Neural Topic Models (ANTM)
ANTM combines novel data mining algorithms to provide a modular framework for discovering evolving topics.
A Python package is developed for researchers and scientists who wish to study the trends and evolving patterns of topics in large-scale textual data.
arXiv Detail & Related papers (2023-02-03T02:31:12Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TopicNet: Semantic Graph-Guided Topic Discovery [51.71374479354178]
Existing deep hierarchical topic models are able to extract semantically meaningful topics from a text corpus in an unsupervised manner.
We introduce TopicNet as a deep hierarchical topic model that can inject prior structural knowledge as an inductive bias to influence learning.
arXiv Detail & Related papers (2021-10-27T09:07:14Z) - Recurrent Coupled Topic Modeling over Sequential Documents [33.35324412209806]
We show that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-topic-thread evolution.
A new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics.
A novel Gibbs sampler with a backward-forward filter algorithm efficiently learns latent timeevolving parameters in a closed-form.
arXiv Detail & Related papers (2021-06-23T08:58:13Z) - Topic Scaling: A Joint Document Scaling -- Topic Model Approach To Learn
Time-Specific Topics [0.0]
This paper proposes a new methodology to study sequential corpora by implementing a two-stage algorithm that learns time-based topics with respect to a scale of document positions.
The first stage ranks documents using Wordfish to estimate document positions that serve as a dependent variable to learn relevant topics.
The second stage ranks the inferred topics on the document scale to match their occurrences within the corpus and track their evolution.
arXiv Detail & Related papers (2021-03-31T12:35:36Z) - Quadric hypersurface intersection for manifold learning in feature space [52.83976795260532]
manifold learning technique suitable for moderately high dimension and large datasets.
The technique is learned from the training data in the form of an intersection of quadric hypersurfaces.
At test time, this manifold can be used to introduce an outlier score for arbitrary new points.
arXiv Detail & Related papers (2021-02-11T18:52:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.