MPTopic: Improving topic modeling via Masked Permuted pre-training
- URL: http://arxiv.org/abs/2309.01015v1
- Date: Sat, 2 Sep 2023 20:38:58 GMT
- Title: MPTopic: Improving topic modeling via Masked Permuted pre-training
- Authors: Xinche Zhang, Evangelos milios
- Abstract summary: We present MPTopic, a clustering algorithm intrinsically driven by the insights of TF-RDF.
It is evident that the topic keywords identified with the synergy of MPTopic and TF-RDF outperform those extracted by both BERTopic and Top2Vec.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Topic modeling is pivotal in discerning hidden semantic structures within
texts, thereby generating meaningful descriptive keywords. While innovative
techniques like BERTopic and Top2Vec have recently emerged in the forefront,
they manifest certain limitations. Our analysis indicates that these methods
might not prioritize the refinement of their clustering mechanism, potentially
compromising the quality of derived topic clusters. To illustrate, Top2Vec
designates the centroids of clustering results to represent topics, whereas
BERTopic harnesses C-TF-IDF for its topic extraction.In response to these
challenges, we introduce "TF-RDF" (Term Frequency - Relative Document
Frequency), a distinctive approach to assess the relevance of terms within a
document. Building on the strengths of TF-RDF, we present MPTopic, a clustering
algorithm intrinsically driven by the insights of TF-RDF. Through comprehensive
evaluation, it is evident that the topic keywords identified with the synergy
of MPTopic and TF-RDF outperform those extracted by both BERTopic and Top2Vec.
Related papers
- Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System [22.331591533400402]
Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS)
We propose a novel prior semantic guided image fusion method based on the dual-modality strategy.
arXiv Detail & Related papers (2024-03-24T16:41:50Z) - Controllable Topic-Focused Abstractive Summarization [57.8015120583044]
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects.
This paper presents a new Transformer-based architecture capable of producing topic-focused summaries.
arXiv Detail & Related papers (2023-11-12T03:51:38Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - A Clustering-guided Contrastive Fusion for Multi-view Representation
Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation.
We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation.
In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition.
We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors.
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z) - BERTopic: Neural topic modeling with a class-based TF-IDF procedure [0.0]
We present BERTopic, a topic model that extends the feasibility of approach topic modeling as a clustering task.
BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
arXiv Detail & Related papers (2022-03-11T08:35:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.