Moving beyond word lists: towards abstractive topic labels for
human-like topics of scientific documents
- URL: http://arxiv.org/abs/2211.05599v1
- Date: Fri, 28 Oct 2022 17:47:12 GMT
- Title: Moving beyond word lists: towards abstractive topic labels for
human-like topics of scientific documents
- Authors: Domenic Rosati
- Abstract summary: We present an approach to generating human-like topic labels using abstractive multi-document summarization (MDS)
We model topics in citation sentences in order to understand what further research needs to be done to fully operationalize MDS for topic labeling.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Topic models represent groups of documents as a list of words (the topic
labels). This work asks whether an alternative approach to topic labeling can
be developed that is closer to a natural language description of a topic than a
word list. To this end, we present an approach to generating human-like topic
labels using abstractive multi-document summarization (MDS). We investigate our
approach with an exploratory case study. We model topics in citation sentences
in order to understand what further research needs to be done to fully
operationalize MDS for topic labeling. Our case study shows that in addition to
more human-like topics there are additional advantages to evaluation by using
clustering and summarization measures instead of topic model measures. However,
we find that there are several developments needed before we can design a
well-powered study to evaluate MDS for topic modeling fully. Namely, improving
cluster cohesion, improving the factuality and faithfulness of MDS, and
increasing the number of documents that might be supported by MDS. We present a
number of ideas on how these can be tackled and conclude with some thoughts on
how topic modeling can also be used to improve MDS in general.
Related papers
- Interactive Topic Models with Optimal Transport [75.26555710661908]
We present EdTM, as an approach for label name supervised topic modeling.
EdTM models topic modeling as an assignment problem while leveraging LM/LLM based document-topic affinities.
arXiv Detail & Related papers (2024-06-28T13:57:27Z) - Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling [0.9095496510579351]
We investigate the untapped potential of large language models (LLMs) as an alternative for uncovering the underlying topics within extensive text corpora.
Our findings indicate that LLMs with appropriate prompts can stand out as a viable alternative, capable of generating relevant topic titles and adhering to human guidelines to refine and merge topics.
arXiv Detail & Related papers (2024-03-24T17:39:51Z) - TopicGPT: A Prompt-based Topic Modeling Framework [77.72072691307811]
We introduce TopicGPT, a prompt-based framework that uses large language models to uncover latent topics in a text collection.
It produces topics that align better with human categorizations compared to competing methods.
Its topics are also interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions.
arXiv Detail & Related papers (2023-11-02T17:57:10Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - Topic Discovery via Latent Space Clustering of Pretrained Language Model
Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings.
Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z) - Keyword Assisted Embedded Topic Model [1.9000421840914223]
Probabilistic topic models describe how words in documents are generated via a set of latent distributions called topics.
Recently, the Embedded Topic Model (ETM) has extended LDA to utilize the semantic information in word embeddings to derive semantically richer topics.
We propose the Keyword Assisted Embedded Topic Model (KeyETM), which equips ETM with the ability to incorporate user knowledge in the form of informative topic-level priors.
arXiv Detail & Related papers (2021-11-22T07:27:17Z) - Topic-Guided Abstractive Multi-Document Summarization [21.856615677793243]
A critical point of multi-document summarization (MDS) is to learn the relations among various documents.
We propose a novel abstractive MDS model, in which we represent multiple documents as a heterogeneous graph.
We employ a neural topic model to jointly discover latent topics that can act as cross-document semantic units.
arXiv Detail & Related papers (2021-10-21T15:32:30Z) - ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive
Summarization with Argument Mining [61.82562838486632]
We crowdsource four new datasets on diverse online conversation forms of news comments, discussion forums, community question answering forums, and email threads.
We benchmark state-of-the-art models on our datasets and analyze characteristics associated with the data.
arXiv Detail & Related papers (2021-06-01T22:17:13Z) - Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data
and Methodology [68.8836704199096]
Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents.
With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses.
Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence.
arXiv Detail & Related papers (2020-08-21T22:43:27Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.