Related papers: MPTopic: Improving topic modeling via Masked Permuted pre-training

MPTopic: Improving topic modeling via Masked Permuted pre-training

URL: http://arxiv.org/abs/2309.01015v1
Date: Sat, 2 Sep 2023 20:38:58 GMT
Title: MPTopic: Improving topic modeling via Masked Permuted pre-training
Authors: Xinche Zhang, Evangelos milios
Abstract summary: We present MPTopic, a clustering algorithm intrinsically driven by the insights of TF-RDF. It is evident that the topic keywords identified with the synergy of MPTopic and TF-RDF outperform those extracted by both BERTopic and Top2Vec.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Topic modeling is pivotal in discerning hidden semantic structures within texts, thereby generating meaningful descriptive keywords. While innovative techniques like BERTopic and Top2Vec have recently emerged in the forefront, they manifest certain limitations. Our analysis indicates that these methods might not prioritize the refinement of their clustering mechanism, potentially compromising the quality of derived topic clusters. To illustrate, Top2Vec designates the centroids of clustering results to represent topics, whereas BERTopic harnesses C-TF-IDF for its topic extraction.In response to these challenges, we introduce "TF-RDF" (Term Frequency - Relative Document Frequency), a distinctive approach to assess the relevance of terms within a document. Building on the strengths of TF-RDF, we present MPTopic, a clustering algorithm intrinsically driven by the insights of TF-RDF. Through comprehensive evaluation, it is evident that the topic keywords identified with the synergy of MPTopic and TF-RDF outperform those extracted by both BERTopic and Top2Vec.

Related papers

Conceptual Topic Aggregation [0.0]
We propose FAT-CAT, an approach based on Formal Concept Analysis (FCA) to enhance meaningful topic aggregation and visualization.<n>Our approach can handle diverse topics and file types -- grouped by directories -- to construct a concept lattice that offers a structured, hierarchical representation of their topic distribution.
arXiv Detail & Related papers (2025-06-27T15:19:38Z)
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction [24.46493675079128]
OCR-dependent methods rely on offline OCR engines, while OCR-free methods might produce outputs that lack interpretability or contain hallucinated content. We propose HIP, which models entities as HIerarchical Points to better conform to the hierarchical nature of the end-to-end VIE task. Specifically, such hierarchical points can be flexibly encoded and subsequently decoded into desired text transcripts, centers of various regions, and categories of entities.
arXiv Detail & Related papers (2024-11-02T05:00:13Z)
Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image. UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z)
Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System [22.331591533400402]
Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS) We propose a novel prior semantic guided image fusion method based on the dual-modality strategy.
arXiv Detail & Related papers (2024-03-24T16:41:50Z)
Controllable Topic-Focused Abstractive Summarization [57.8015120583044]
Controlled abstractive summarization focuses on producing condensed versions of a source article to cover specific aspects. This paper presents a new Transformer-based architecture capable of producing topic-focused summaries.
arXiv Detail & Related papers (2023-11-12T03:51:38Z)
Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities. We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions. We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
A Clustering-guided Contrastive Fusion for Multi-view Representation Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation. We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation. In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z)
Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling. Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z)
SpatioTemporal Focus for Skeleton-based Action Recognition [66.8571926307011]
Graph convolutional networks (GCNs) are widely adopted in skeleton-based action recognition. We argue that the performance of recent proposed skeleton-based action recognition methods is limited by the following factors. Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information.
arXiv Detail & Related papers (2022-03-31T02:45:24Z)
BERTopic: Neural topic modeling with a class-based TF-IDF procedure [0.0]
We present BERTopic, a topic model that extends the feasibility of approach topic modeling as a clustering task. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.
arXiv Detail & Related papers (2022-03-11T08:35:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.