Related papers: A Framework for Neural Topic Modeling of Text Corpora

A Framework for Neural Topic Modeling of Text Corpora

URL: http://arxiv.org/abs/2108.08946v1
Date: Thu, 19 Aug 2021 23:32:38 GMT
Title: A Framework for Neural Topic Modeling of Text Corpora
Authors: Shayan Fazeli, Majid Sarrafzadeh
Abstract summary: We introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features. To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset.
Score: 6.340447411058068
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Topic Modeling refers to the problem of discovering the main topics that have occurred in corpora of textual data, with solutions finding crucial applications in numerous fields. In this work, inspired by the recent advancements in the Natural Language Processing domain, we introduce FAME, an open-source framework enabling an efficient mechanism of extracting and incorporating textual features and utilizing them in discovering topics and clustering text documents that are semantically similar in a corpus. These features range from traditional approaches (e.g., frequency-based) to the most recent auto-encoding embeddings from transformer-based language models such as BERT model family. To demonstrate the effectiveness of this library, we conducted experiments on the well-known News-Group dataset. The library is available online.

Related papers

Improving Large Vision-Language Models' Understanding for Field Data [62.917026891829025]
We introduce FieldLVLM, a framework designed to improve large vision-language models' understanding of field data.<n>FieldLVLM consists of two main components: a field-aware language generation strategy and a data-compressed multimodal model tuning.<n> Experimental results on newly proposed benchmark datasets demonstrate that FieldLVLM significantly outperforms existing methods in tasks involving scientific field data.
arXiv Detail & Related papers (2025-07-24T11:28:53Z)
Towards Leveraging Large Language Model Summaries for Topic Modeling in Source Code [0.0]
Large language models (LLMs) have demonstrated remarkable program comprehension capabilities. transformer-based topic modeling techniques offer effective ways to extract semantic information from text. This paper proposes and explores a novel approach that combines these strengths to automatically identify meaningful topics in a corpus of Python programs.
arXiv Detail & Related papers (2025-04-24T10:30:40Z)
Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms [6.349503549199403]
This study introduces an innovative end-to-end semantic-driven topic modeling technique for the topic extraction process. Our model generates document embeddings using pre-trained transformer-based language models. Compared to ChatGPT and traditional topic modeling algorithms, our model provides more coherent and meaningful topics.
arXiv Detail & Related papers (2024-09-30T18:15:31Z)
CELA: Cost-Efficient Language Model Alignment for CTR Prediction [71.85120354973073]
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs) We propose textbfCost-textbfEfficient textbfLanguage Model textbfAlignment (textbfCELA) for CTR prediction.
arXiv Detail & Related papers (2024-05-17T07:43:25Z)
GPT Struct Me: Probing GPT Models on Narrative Entity Extraction [2.049592435988883]
We evaluate the capabilities of two state-of-the-art language models -- GPT-3 and GPT-3.5 -- in the extraction of narrative entities. This study is conducted on the Text2Story Lusa dataset, a collection of 119 Portuguese news articles.
arXiv Detail & Related papers (2023-11-24T16:19:04Z)
Topics as Entity Clusters: Entity-based Topics from Large Language Models and Graph Neural Networks [0.6486052012623045]
We propose a novel topic clustering approach using bimodal vector representations of entities. Our approach is better suited to working with entities in comparison to state-of-the-art models.
arXiv Detail & Related papers (2023-01-06T10:54:54Z)
An Overview on Controllable Text Generation via Variational Auto-Encoders [15.97186478109836]
Recent advances in neural-based generative modeling have reignited the hopes of having computer systems capable of conversing with humans. Latent variable models (LVM) such as variational auto-encoders (VAEs) are designed to characterize the distributional pattern of textual data. This overview gives an introduction to existing generation schemes, problems associated with text variational auto-encoders, and a review of several applications about the controllable generation.
arXiv Detail & Related papers (2022-11-15T07:36:11Z)
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations [35.74225306947918]
We propose a joint latent space learning and clustering framework built upon PLM embeddings. Our model effectively leverages the strong representation power and superb linguistic features brought by PLMs for topic discovery.
arXiv Detail & Related papers (2022-02-09T17:26:08Z)
Pre-training Language Model Incorporating Domain-specific Heterogeneous Knowledge into A Unified Representation [49.89831914386982]
We propose a unified pre-trained language model (PLM) for all forms of text, including unstructured text, semi-structured text, and well-structured text. Our approach outperforms the pre-training of plain text using only 1/4 of the data.
arXiv Detail & Related papers (2021-09-02T16:05:24Z)
Positioning yourself in the maze of Neural Text Generation: A Task-Agnostic Survey [54.34370423151014]
This paper surveys the components of modeling approaches relaying task impacts across various generation tasks such as storytelling, summarization, translation etc. We present an abstraction of the imperative techniques with respect to learning paradigms, pretraining, modeling approaches, decoding and the key challenges outstanding in the field in each of them.
arXiv Detail & Related papers (2020-10-14T17:54:42Z)
Neural Entity Linking: A Survey of Models Based on Deep Learning [82.43751915717225]
This survey presents a comprehensive description of recent neural entity linking (EL) systems developed since 2015. Its goal is to systemize design features of neural entity linking systems and compare their performance to the remarkable classic methods on common benchmarks. The survey touches on applications of entity linking, focusing on the recently emerged use-case of enhancing deep pre-trained masked language models.
arXiv Detail & Related papers (2020-05-31T18:02:26Z)
A Multi-Perspective Architecture for Semantic Code Search [58.73778219645548]
We propose a novel multi-perspective cross-lingual neural framework for code--text matching. Our experiments on the CoNaLa dataset show that our proposed model yields better performance than previous approaches.
arXiv Detail & Related papers (2020-05-06T04:46:11Z)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer [64.22926988297685]
Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP) In this paper, we explore the landscape of introducing transfer learning techniques for NLP by a unified framework that converts all text-based language problems into a text-to-text format.
arXiv Detail & Related papers (2019-10-23T17:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.