Community-Detection via Hashtag-Graphs for Semi-Supervised NMF Topic
Models
- URL: http://arxiv.org/abs/2111.10401v1
- Date: Wed, 17 Nov 2021 12:52:16 GMT
- Title: Community-Detection via Hashtag-Graphs for Semi-Supervised NMF Topic
Models
- Authors: Mattias Luber and Anton Thielmann and Christoph Weisser and Benjamin
S\"afken
- Abstract summary: This paper outlines a novel approach on how to integrate topic structures of hashtag graphs into the estimation of topic models.
By applying this approach on recently streamed Twitter data it will be seen that this procedure actually leads to more intuitive and humanly interpretable topics.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Extracting topics from large collections of unstructured text-documents has
become a central task in current NLP applications and algorithms like NMF, LDA
as well as their generalizations are the well-established current state of the
art. However, especially when it comes to short text documents like Tweets,
these approaches often lead to unsatisfying results due to the sparsity of the
document-feature matrices.
Even though, several approaches have been proposed to overcome this sparsity
by taking additional information into account, these are merely focused on the
aggregation of similar documents and the estimation of word-co-occurrences.
This ultimately completely neglects the fact that a lot of topical-information
can be actually retrieved from so-called hashtag-graphs by applying common
community detection algorithms. Therefore, this paper outlines a novel approach
on how to integrate topic structures of hashtag graphs into the estimation of
topic models by connecting graph-based community detection and semi-supervised
NMF.
By applying this approach on recently streamed Twitter data it will be seen
that this procedure actually leads to more intuitive and humanly interpretable
topics.
Related papers
- GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization [13.61818620609812]
We propose a lightweight yet effective unsupervised approach called GLIMMER: a Graph and LexIcal features based unsupervised Multi-docuMEnt summaRization approach.
It first constructs a sentence graph from the source documents, then automatically identifies semantic clusters by mining low-level features from raw texts.
Experiments conducted on Multi-News, Multi-XScience and DUC-2004 demonstrate that our approach outperforms existing unsupervised approaches.
arXiv Detail & Related papers (2024-08-19T16:01:48Z) - Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings.
Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework.
Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z) - A Proposition-Level Clustering Approach for Multi-Document Summarization [82.4616498914049]
We revisit the clustering approach, grouping together propositions for more precise information alignment.
Our method detects salient propositions, clusters them into paraphrastic clusters, and generates a representative sentence for each cluster by fusing its propositions.
Our summarization method improves over the previous state-of-the-art MDS method in the DUC 2004 and TAC 2011 datasets.
arXiv Detail & Related papers (2021-12-16T10:34:22Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Integrating Semantics and Neighborhood Information with Graph-Driven
Generative Models for Document Retrieval [51.823187647843945]
In this paper, we encode the neighborhood information with a graph-induced Gaussian distribution, and propose to integrate the two types of information with a graph-driven generative model.
Under the approximation, we prove that the training objective can be decomposed into terms involving only singleton or pairwise documents, enabling the model to be trained as efficiently as uncorrelated ones.
arXiv Detail & Related papers (2021-05-27T11:29:03Z) - Enhancing Extractive Text Summarization with Topic-Aware Graph Neural
Networks [21.379555672973975]
This paper proposes a graph neural network (GNN)-based extractive summarization model.
Our model integrates a joint neural topic model (NTM) to discover latent topics, which can provide document-level features for sentence selection.
The experimental results demonstrate that our model achieves substantially state-of-the-art results on CNN/DM and NYT datasets.
arXiv Detail & Related papers (2020-10-13T09:30:04Z) - SummPip: Unsupervised Multi-Document Summarization with Sentence Graph
Compression [61.97200991151141]
SummPip is an unsupervised method for multi-document summarization.
We convert the original documents to a sentence graph, taking both linguistic and deep representation into account.
We then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary.
arXiv Detail & Related papers (2020-07-17T13:01:15Z) - A Large-Scale Multi-Document Summarization Dataset from the Wikipedia
Current Events Portal [10.553314461761968]
Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries.
This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters.
arXiv Detail & Related papers (2020-05-20T14:33:33Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z) - Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for
Fast and Good Topics too! [5.819224524813161]
We propose an alternative way to obtain topics: clustering pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words.
The best performing combination for our approach performs as well as classical topic models, but with lower runtime and computational complexity.
arXiv Detail & Related papers (2020-04-30T16:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.