Enhance Topics Analysis based on Keywords Properties
- URL: http://arxiv.org/abs/2203.04786v1
- Date: Wed, 9 Mar 2022 15:10:12 GMT
- Title: Enhance Topics Analysis based on Keywords Properties
- Authors: Antonio Penta
- Abstract summary: We present a specificity score based on keywords properties that is able to select the most informative topics.
In the experiments, we show that we are able to compress the state-of-the-art topic modelling results of different factors with an information loss that is much lower than the solution based on the recent coherence score presented in literature.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic Modelling is one of the most prevalent text analysis technique used to
explore and retrieve collection of documents. The evaluation of the topic model
algorithms is still a very challenging tasks due to the absence of
gold-standard list of topics to compare against for every corpus. In this work,
we present a specificity score based on keywords properties that is able to
select the most informative topics. This approach helps the user to focus on
the most informative topics. In the experiments, we show that we are able to
compress the state-of-the-art topic modelling results of different factors with
an information loss that is much lower than the solution based on the recent
coherence score presented in literature.
Related papers
- CAST: Corpus-Aware Self-similarity Enhanced Topic modelling [16.562349140796115]
We introduce CAST: Corpus-Aware Self-similarity Enhanced Topic modelling, a novel topic modelling method.
We find self-similarity to be an effective metric to prevent functional words from acting as candidate topic words.
Our approach significantly enhances the coherence and diversity of generated topics, as well as the topic model's ability to handle noisy data.
arXiv Detail & Related papers (2024-10-19T15:27:11Z) - Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Prompting Large Language Models for Topic Modeling [10.31712610860913]
We propose PromptTopic, a novel topic modeling approach that harnesses the advanced language understanding of large language models (LLMs)
It involves extracting topics at the sentence level from individual documents, then aggregating and condensing these topics into a predefined quantity, ultimately providing coherent topics for texts of varying lengths.
We benchmark PromptTopic against the state-of-the-art baselines on three vastly diverse datasets, establishing its proficiency in discovering meaningful topics.
arXiv Detail & Related papers (2023-12-15T11:15:05Z) - TopicAdapt- An Inter-Corpora Topics Adaptation Approach [27.450275637652418]
This paper proposes a neural topic model, TopicAdapt, that can adapt relevant topics from a related source corpus and also discover new topics in a target corpus that are absent in the source corpus.
Experiments over multiple datasets from diverse domains show the superiority of the proposed model against the state-of-the-art topic models.
arXiv Detail & Related papers (2023-10-08T02:56:44Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Unsupervised Summarization for Chat Logs with Topic-Oriented Ranking and
Context-Aware Auto-Encoders [59.038157066874255]
We propose a novel framework called RankAE to perform chat summarization without employing manually labeled data.
RankAE consists of a topic-oriented ranking strategy that selects topic utterances according to centrality and diversity simultaneously.
A denoising auto-encoder is designed to generate succinct but context-informative summaries based on the selected utterances.
arXiv Detail & Related papers (2020-12-14T07:31:17Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - BATS: A Spectral Biclustering Approach to Single Document Topic Modeling
and Segmentation [17.003488045214972]
Existing topic modeling and text segmentation methodologies generally require large datasets for training, limiting their capabilities when only small collections of text are available.
In developing a methodology to handle single documents, we face two major challenges.
First is sparse information: with access to only one document, we cannot train traditional topic models or deep learning algorithms.
Second is significant noise: a considerable portion of words in any single document will produce only noise and not help discern topics or segments.
arXiv Detail & Related papers (2020-08-05T16:34:33Z) - Keyword Assisted Topic Models [0.0]
We show that providing a small number of keywords can substantially enhance the measurement performance of topic models.
KeyATM provides more interpretable results, has better document classification performance, and is less sensitive to the number of topics than the standard topic models.
arXiv Detail & Related papers (2020-04-13T14:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.