FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model
- URL: http://arxiv.org/abs/2405.17978v2
- Date: Sat, 26 Oct 2024 12:36:11 GMT
- Title: FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model
- Authors: Xiaobao Wu, Thong Nguyen, Delvin Ce Zhang, William Yang Wang, Anh Tuan Luu,
- Abstract summary: We propose FASTopic, a fast, adaptive, stable, and transferable topic model.
We use Dual Semantic-relation Reconstruction (DSR) to model latent topics.
We also propose Embedding Transport Plan (ETP) to regularize semantic relations as optimal transport plans.
- Score: 76.509837704596
- License:
- Abstract: Topic models have been evolving rapidly over the years, from conventional to recent neural models. However, existing topic models generally struggle with either effectiveness, efficiency, or stability, highly impeding their practical applications. In this paper, we propose FASTopic, a fast, adaptive, stable, and transferable topic model. FASTopic follows a new paradigm: Dual Semantic-relation Reconstruction (DSR). Instead of previous conventional, VAE-based, or clustering-based methods, DSR directly models the semantic relations among document embeddings from a pretrained Transformer and learnable topic and word embeddings. By reconstructing through these semantic relations, DSR discovers latent topics. This brings about a neat and efficient topic modeling framework. We further propose a novel Embedding Transport Plan (ETP) method. Rather than early straightforward approaches, ETP explicitly regularizes the semantic relations as optimal transport plans. This addresses the relation bias issue and thus leads to effective topic modeling. Extensive experiments on benchmark datasets demonstrate that our FASTopic shows superior effectiveness, efficiency, adaptivity, stability, and transferability, compared to state-of-the-art baselines across various scenarios.
Related papers
- Meta-Learning Adaptable Foundation Models [37.458141335750696]
We introduce a meta-learning framework infused with PEFT in this intermediate retraining stage to learn a model that can be easily adapted to unseen tasks.
In this setting, we demonstrate the suboptimality of standard retraining for finding an adaptable set of parameters.
We then apply these theoretical insights to retraining the RoBERTa model to predict the continuation of conversations within the ConvAI2 dataset.
arXiv Detail & Related papers (2024-10-29T17:24:18Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Robust Traffic Forecasting against Spatial Shift over Years [11.208740750755025]
We investigate state-temporal-the-art models using newly proposed traffic OOD benchmarks.
We find that these models experience significant decline in performance.
We propose a novel of Mixture Experts framework, which learns a set of graph generators during training and combines them to generate new graphs.
Our method is both parsimonious and efficacious, and can be seamlessly integrated into anytemporal model.
arXiv Detail & Related papers (2024-10-01T03:49:29Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Probabilistic Topic Modelling with Transformer Representations [0.9999629695552195]
We propose the Transformer-Representation Neural Topic Model (TNTM)
This approach unifies the powerful and versatile notion of topics based on transformer embeddings with fully probabilistic modelling.
Experimental results show that our proposed model achieves results on par with various state-of-the-art approaches in terms of embedding coherence.
arXiv Detail & Related papers (2024-03-06T14:27:29Z) - Improving Transferability of Adversarial Examples via Bayesian Attacks [84.90830931076901]
We introduce a novel extension by incorporating the Bayesian formulation into the model input as well, enabling the joint diversification of both the model input and model parameters.
Our method achieves a new state-of-the-art on transfer-based attacks, improving the average success rate on ImageNet and CIFAR-10 by 19.14% and 2.08%, respectively.
arXiv Detail & Related papers (2023-07-21T03:43:07Z) - Test-time Adaptation in the Dynamic World with Compound Domain Knowledge
Management [75.86903206636741]
Test-time adaptation (TTA) allows the model to adapt itself to novel environments and improve its performance during test time.
Several works for TTA have shown promising adaptation performances in continuously changing environments.
This paper first presents a robust TTA framework with compound domain knowledge management.
We then devise novel regularization which modulates the adaptation rates using domain-similarity between the source and the current target domain.
arXiv Detail & Related papers (2022-12-16T09:02:01Z) - Non-autoregressive Transformer-based End-to-end ASR using BERT [13.07939371864781]
This paper presents a transformer-based end-to-end automatic speech recognition (ASR) model based on BERT.
A series of experiments conducted on the AISHELL-1 dataset demonstrates competitive or superior results.
arXiv Detail & Related papers (2021-04-10T16:22:17Z) - Neural Topic Model via Optimal Transport [24.15046280736009]
We present a new neural topic model via the theory of optimal transport (OT)
Specifically, we propose to learn the topic distribution of a document by directly minimising its OT distance to the document's word distributions.
Our proposed model can be trained efficiently with a differentiable loss.
arXiv Detail & Related papers (2020-08-12T06:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.