Recurrent Coupled Topic Modeling over Sequential Documents
- URL: http://arxiv.org/abs/2106.13732v1
- Date: Wed, 23 Jun 2021 08:58:13 GMT
- Title: Recurrent Coupled Topic Modeling over Sequential Documents
- Authors: Jinjin Guo, Longbing Cao and Zhiguo Gong
- Abstract summary: We show that a current topic evolves from all prior topics with corresponding coupling weights, forming the multi-topic-thread evolution.
A new solution with a set of novel data augmentation techniques is proposed, which successfully discomposes the multi-couplings between evolving topics.
A novel Gibbs sampler with a backward-forward filter algorithm efficiently learns latent timeevolving parameters in a closed-form.
- Score: 33.35324412209806
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The abundant sequential documents such as online archival, social media and
news feeds are streamingly updated, where each chunk of documents is
incorporated with smoothly evolving yet dependent topics. Such digital texts
have attracted extensive research on dynamic topic modeling to infer hidden
evolving topics and their temporal dependencies. However, most of the existing
approaches focus on single-topic-thread evolution and ignore the fact that a
current topic may be coupled with multiple relevant prior topics. In addition,
these approaches also incur the intractable inference problem when inferring
latent parameters, resulting in a high computational cost and performance
degradation. In this work, we assume that a current topic evolves from all
prior topics with corresponding coupling weights, forming the
multi-topic-thread evolution. Our method models the dependencies between
evolving topics and thoroughly encodes their complex multi-couplings across
time steps. To conquer the intractable inference challenge, a new solution with
a set of novel data augmentation techniques is proposed, which successfully
discomposes the multi-couplings between evolving topics. A fully conjugate
model is thus obtained to guarantee the effectiveness and efficiency of the
inference technique. A novel Gibbs sampler with a backward-forward filter
algorithm efficiently learns latent timeevolving parameters in a closed-form.
In addition, the latent Indian Buffet Process (IBP) compound distribution is
exploited to automatically infer the overall topic number and customize the
sparse topic proportions for each sequential document without bias. The
proposed method is evaluated on both synthetic and real-world datasets against
the competitive baselines, demonstrating its superiority over the baselines in
terms of the low per-word perplexity, high coherent topics, and better document
time prediction.
Related papers
- Bundle Fragments into a Whole: Mining More Complete Clusters via Submodular Selection of Interesting webpages for Web Topic Detection [49.8035161337388]
A state-of-the-art solution is firstly to organize webpages into a large volume of multi-granularity topic candidates.
Hot topics are further identified by estimating their interestingness.
This paper proposes a bundling-refining approach to mine more complete hot topics from fragments.
arXiv Detail & Related papers (2024-09-19T00:46:31Z) - Iterative Improvement of an Additively Regularized Topic Model [0.0]
We present a method for iterative training of a topic model.
Experiments conducted on several collections of natural language texts show that the proposed ITAR model performs better than other popular topic models.
arXiv Detail & Related papers (2024-08-11T18:22:12Z) - Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation [49.36436704082436]
How-to questions are integral to decision-making processes and require dynamic, step-by-step answers.
We propose Thread, a novel data organization paradigm aimed at enabling current systems to handle how-to questions more effectively.
arXiv Detail & Related papers (2024-06-19T09:14:41Z) - FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model [76.509837704596]
We propose FASTopic, a fast, adaptive, stable, and transferable topic model.
We use Dual Semantic-relation Reconstruction (DSR) to model latent topics.
We also propose Embedding Transport Plan (ETP) to regularize semantic relations as optimal transport plans.
arXiv Detail & Related papers (2024-05-28T09:06:38Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - ANTM: An Aligned Neural Topic Model for Exploring Evolving Topics [1.854328133293073]
This paper presents an algorithmic family of dynamic topic models called Aligned Neural Topic Models (ANTM)
ANTM combines novel data mining algorithms to provide a modular framework for discovering evolving topics.
A Python package is developed for researchers and scientists who wish to study the trends and evolving patterns of topics in large-scale textual data.
arXiv Detail & Related papers (2023-02-03T02:31:12Z) - Neural Dynamic Focused Topic Model [2.9005223064604078]
We leverage recent advances in neural variational inference and present an alternative neural approach to the dynamic Focused Topic Model.
We develop a neural model for topic evolution which exploits sequences of Bernoulli random variables in order to track the appearances of topics.
arXiv Detail & Related papers (2023-01-26T08:37:34Z) - Sequential Topic Selection Model with Latent Variable for Topic-Grounded
Dialogue [21.1427816176227]
We propose a novel approach, named Sequential Global Topic Attention (SGTA) to exploit topic transition over all conversations.
Our model outperforms competitive baselines on prediction and generation tasks.
arXiv Detail & Related papers (2022-10-17T07:34:14Z) - $\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
Generation [65.29170569821093]
parallel text generation has received widespread attention due to its success in generation efficiency.
In this paper, we propose $textitlatent$-GLAT, which employs the discrete latent variables to capture word categorical information.
Experiment results show that our method outperforms strong baselines without the help of an autoregressive model.
arXiv Detail & Related papers (2022-04-05T07:34:12Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.