Effective Seed-Guided Topic Discovery by Integrating Multiple Types of
Contexts
- URL: http://arxiv.org/abs/2212.06002v1
- Date: Mon, 12 Dec 2022 16:03:38 GMT
- Title: Effective Seed-Guided Topic Discovery by Integrating Multiple Types of
Contexts
- Authors: Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng,
Jiawei Han
- Abstract summary: We propose an iterative framework, SeedTopicMine, which jointly learns from three types of contexts and fuses their context signals via an ensemble ranking process.
Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.
- Score: 28.291684568220827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instead of mining coherent topics from a given text corpus in a completely
unsupervised manner, seed-guided topic discovery methods leverage user-provided
seed words to extract distinctive and coherent topics so that the mined topics
can better cater to the user's interest. To model the semantic correlation
between words and seeds for discovering topic-indicative terms, existing
seed-guided approaches utilize different types of context signals, such as
document-level word co-occurrences, sliding window-based local contexts, and
generic linguistic knowledge brought by pre-trained language models. In this
work, we analyze and show empirically that each type of context information has
its value and limitation in modeling word semantics under seed guidance, but
combining three types of contexts (i.e., word embeddings learned from local
contexts, pre-trained language model representations obtained from
general-domain training, and topic-indicative sentences retrieved based on seed
information) allows them to complement each other for discovering quality
topics. We propose an iterative framework, SeedTopicMine, which jointly learns
from the three types of contexts and gradually fuses their context signals via
an ensemble ranking process. Under various sets of seeds and on multiple
datasets, SeedTopicMine consistently yields more coherent and accurate topics
than existing seed-guided topic discovery approaches.
Related papers
- Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Variational Cross-Graph Reasoning and Adaptive Structured Semantics
Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence.
We introduce a new Compositional Temporal Grounding task and construct two new dataset splits.
We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z) - Improve Discourse Dependency Parsing with Contextualized Representations [28.916249926065273]
We propose to take advantage of transformers to encode contextualized representations of units of different levels.
Motivated by the observation of writing patterns commonly shared across articles, we propose a novel method that treats discourse relation identification as a sequence labelling task.
arXiv Detail & Related papers (2022-05-04T14:35:38Z) - Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds [33.744478898032376]
Seed-guided topic discovery approaches leverage user-provided seeds to discover topic-representative terms.
In this paper, we generalize the task of seed-guided topic discovery to allow out-of-vocabulary seeds.
We propose a novel framework, named SeeTopic, wherein the general knowledge of PLMs and the local semantics learned from the input corpus can mutually benefit each other.
arXiv Detail & Related papers (2022-05-04T01:49:36Z) - Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations.
Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z) - A Neural Generative Model for Joint Learning Topics and Topic-Specific
Word Embeddings [42.87769996249732]
We propose a novel generative model to explore both local and global context for joint learning topics and topic-specific word embeddings.
The trained model maps words to topic-dependent embeddings, which naturally addresses the issue of word polysemy.
arXiv Detail & Related papers (2020-08-11T13:54:11Z) - A Survey on Contextual Embeddings [48.04732268018772]
Contextual embeddings assign each word a representation based on its context, capturing uses of words across varied contexts and encoding knowledge that transfers across languages.
We review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.
arXiv Detail & Related papers (2020-03-16T15:22:22Z) - How Far are We from Effective Context Modeling? An Exploratory Study on
Semantic Parsing in Context [59.13515950353125]
We present a grammar-based decoding semantic parsing and adapt typical context modeling methods on top of it.
We evaluate 13 context modeling methods on two large cross-domain datasets, and our best model achieves state-of-the-art performances.
arXiv Detail & Related papers (2020-02-03T11:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.