Term-community-based topic detection with variable resolution
- URL: http://arxiv.org/abs/2103.13550v1
- Date: Thu, 25 Mar 2021 01:29:39 GMT
- Title: Term-community-based topic detection with variable resolution
- Authors: Andreas Hamm and Simon Odrowski (German Aerospace Center DLR)
- Abstract summary: Network-based procedures for topic detection in huge text collections offer an intuitive alternative to probabilistic topic models.
We present a method that is especially designed with the requirements of domain experts in mind.
We demonstrate the application of our method with a widely used corpus of general news articles and show the results of detailed social-sciences expert evaluations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Network-based procedures for topic detection in huge text collections offer
an intuitive alternative to probabilistic topic models. We present in detail a
method that is especially designed with the requirements of domain experts in
mind. Like similar methods, it employs community detection in term
co-occurrence graphs, but it is enhanced by including a resolution parameter
that can be used for changing the targeted topic granularity. We also establish
a term ranking and use semantic word-embedding for presenting term communities
in a way that facilitates their interpretation.
We demonstrate the application of our method with a widely used corpus of
general news articles and show the results of detailed social-sciences expert
evaluations of detected topics at various resolutions. A comparison with topics
detected by Latent Dirichlet Allocation is also included. Finally, we discuss
factors that influence topic interpretation.
Related papers
- Improving Retrieval in Theme-specific Applications using a Corpus
Topical Taxonomy [52.426623750562335]
We introduce ToTER (Topical taxonomy Enhanced Retrieval) framework.
ToTER identifies the central topics of queries and documents with the guidance of the taxonomy, and exploits their topical relatedness to supplement missing contexts.
As a plug-and-play framework, ToTER can be flexibly employed to enhance various PLM-based retrievers.
arXiv Detail & Related papers (2024-03-07T02:34:54Z) - Discovering Significant Topics from Legal Decisions with Selective
Inference [0.0]
We propose and evaluate an automated pipeline for discovering significant topics from legal decision texts.
The method identifies case topics significantly correlated with outcomes, topic-word distributions and case-topic weights.
We show that topics derived by the pipeline are consistent with legal doctrines in both areas and can be useful in other related legal analysis tasks.
arXiv Detail & Related papers (2024-01-02T07:00:24Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - A Human Word Association based model for topic detection in social networks [1.8749305679160366]
This paper introduces a topic detection framework for social networks based on the concept of imitating the mental ability of word association.
The performance of this framework is evaluated using the FA-CUP dataset, a benchmark in the field of topic detection.
arXiv Detail & Related papers (2023-01-30T17:10:34Z) - Contextual information integration for stance detection via
cross-attention [59.662413798388485]
Stance detection deals with identifying an author's stance towards a target.
Most existing stance detection models are limited because they do not consider relevant contextual information.
We propose an approach to integrate contextual information as text.
arXiv Detail & Related papers (2022-11-03T15:04:29Z) - Enhance Topics Analysis based on Keywords Properties [0.0]
We present a specificity score based on keywords properties that is able to select the most informative topics.
In the experiments, we show that we are able to compress the state-of-the-art topic modelling results of different factors with an information loss that is much lower than the solution based on the recent coherence score presented in literature.
arXiv Detail & Related papers (2022-03-09T15:10:12Z) - Comprehensive Studies for Arbitrary-shape Scene Text Detection [78.50639779134944]
We propose a unified framework for the bottom-up based scene text detection methods.
Under the unified framework, we ensure the consistent settings for non-core modules.
With the comprehensive investigations and elaborate analyses, it reveals the advantages and disadvantages of previous models.
arXiv Detail & Related papers (2021-07-25T13:18:55Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Vocabulary-based Method for Quantifying Controversy in Social Media [0.0]
We develop a method for controversy detection based primarily on the jargon used by the communities in social media.
Our method dispenses with the use of domain-specific knowledge, is language-agnostic, efficient and easy to apply.
arXiv Detail & Related papers (2020-01-14T17:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.