Multi-source Neural Topic Modeling in Multi-view Embedding Spaces
- URL: http://arxiv.org/abs/2104.08551v1
- Date: Sat, 17 Apr 2021 14:08:00 GMT
- Title: Multi-source Neural Topic Modeling in Multi-view Embedding Spaces
- Authors: Pankaj Gupta, Yatin Chaudhary, Hinrich Sch\"utze
- Abstract summary: This work presents a novel neural topic modeling framework using multi-view embedding spaces.
We first build respective pools of pretrained topic (i.e., TopicPool) and word embeddings (i.e., WordPool)
We then identify one or more relevant source domain(s) and transfer knowledge to guide meaningful learning in the sparse target domain.
- Score: 21.506835768643466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Though word embeddings and topics are complementary representations, several
past works have only used pretrained word embeddings in (neural) topic modeling
to address data sparsity in short-text or small collection of documents. This
work presents a novel neural topic modeling framework using multi-view
embedding spaces: (1) pretrained topic-embeddings, and (2) pretrained
word-embeddings (context insensitive from Glove and context-sensitive from BERT
models) jointly from one or many sources to improve topic quality and better
deal with polysemy. In doing so, we first build respective pools of pretrained
topic (i.e., TopicPool) and word embeddings (i.e., WordPool). We then identify
one or more relevant source domain(s) and transfer knowledge to guide
meaningful learning in the sparse target domain. Within neural topic modeling,
we quantify the quality of topics and document representations via
generalization (perplexity), interpretability (topic coherence) and information
retrieval (IR) using short-text, long-text, small and large document
collections from news and medical domains. Introducing the multi-source
multi-view embedding spaces, we have shown state-of-the-art neural topic
modeling using 6 source (high-resource) and 5 target (low-resource) corpora.
Related papers
- Embedded Topic Models Enhanced by Wikification [3.082729239227955]
We incorporate the Wikipedia knowledge into a neural topic model to make it aware of named entities.
Our experiments show that our method improves the performance of neural topic models in generalizability.
arXiv Detail & Related papers (2024-10-03T12:39:14Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - MGDoc: Pre-training with Multi-granular Hierarchy for Document Image
Understanding [53.03978356918377]
spatial hierarchical relationships between content at different levels of granularity are crucial for document image understanding tasks.
Existing methods learn features from either word-level or region-level but fail to consider both simultaneously.
We propose MGDoc, a new multi-modal multi-granular pre-training framework that encodes page-level, region-level, and word-level information at the same time.
arXiv Detail & Related papers (2022-11-27T22:47:37Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TopNet: Learning from Neural Topic Model to Generate Long Stories [43.5564336855688]
Long story generation (LSG) is one of the coveted goals in natural language processing.
We propose emphTopNet to obtain high-quality skeleton words to complement the short input.
Our proposed framework is highly effective in skeleton word selection and significantly outperforms state-of-the-art models in both automatic evaluation and human evaluation.
arXiv Detail & Related papers (2021-12-14T09:47:53Z) - Neural Attention-Aware Hierarchical Topic Model [25.721713066830404]
We propose a variational autoencoder (VAE) NTM model that jointly reconstructs the sentence and document word counts.
Our model also features hierarchical KL divergence to leverage embeddings of each document to regularize those of their sentences.
Both quantitative and qualitative experiments have shown the efficacy of our model in 1) lowering the reconstruction errors at both the sentence and document levels, and 2) discovering more coherent topics from real-world datasets.
arXiv Detail & Related papers (2021-10-14T05:42:32Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - Context Reinforced Neural Topic Modeling over Short Texts [15.487822291146689]
We propose a Context Reinforced Neural Topic Model (CRNTM)
CRNTM infers the topic for each word in a narrow range by assuming that each short text covers only a few salient topics.
Experiments on two benchmark datasets validate the effectiveness of the proposed model on both topic discovery and text classification.
arXiv Detail & Related papers (2020-08-11T06:41:53Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z) - Neural Topic Modeling with Continual Lifelong Learning [19.969393484927252]
We propose a lifelong learning framework for neural topic modeling.
It can process streams of document collections, accumulate topics and guide future topic modeling tasks.
We demonstrate improved performance quantified by perplexity, topic coherence and information retrieval task.
arXiv Detail & Related papers (2020-06-19T00:43:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.