Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds
- URL: http://arxiv.org/abs/2205.01845v1
- Date: Wed, 4 May 2022 01:49:36 GMT
- Title: Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds
- Authors: Yu Zhang, Yu Meng, Xuan Wang, Sheng Wang, Jiawei Han
- Abstract summary: Seed-guided topic discovery approaches leverage user-provided seeds to discover topic-representative terms.
In this paper, we generalize the task of seed-guided topic discovery to allow out-of-vocabulary seeds.
We propose a novel framework, named SeeTopic, wherein the general knowledge of PLMs and the local semantics learned from the input corpus can mutually benefit each other.
- Score: 33.744478898032376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discovering latent topics from text corpora has been studied for decades.
Many existing topic models adopt a fully unsupervised setting, and their
discovered topics may not cater to users' particular interests due to their
inability of leveraging user guidance. Although there exist seed-guided topic
discovery approaches that leverage user-provided seeds to discover
topic-representative terms, they are less concerned with two factors: (1) the
existence of out-of-vocabulary seeds and (2) the power of pre-trained language
models (PLMs). In this paper, we generalize the task of seed-guided topic
discovery to allow out-of-vocabulary seeds. We propose a novel framework, named
SeeTopic, wherein the general knowledge of PLMs and the local semantics learned
from the input corpus can mutually benefit each other. Experiments on three
real datasets from different domains demonstrate the effectiveness of SeeTopic
in terms of topic coherence, accuracy, and diversity.
Related papers
- Personalized Topic Selection Model for Topic-Grounded Dialogue [24.74527189182273]
Current models tend to predict user-uninteresting and contextually irrelevant topics.
We propose a textbfPersonalized topic stextbfElection model for textbfTopic-grounded textbfDialogue, named textbfPETD.
Our proposed method can generate engaging and diverse responses, outperforming state-of-the-art baselines.
arXiv Detail & Related papers (2024-06-04T06:09:49Z) - Seed-Guided Fine-Grained Entity Typing in Science and Engineering
Domains [51.02035914828596]
We study the task of seed-guided fine-grained entity typing in science and engineering domains.
We propose SEType which first enriches the weak supervision by finding more entities for each seen type from an unlabeled corpus.
It then matches the enriched entities to unlabeled text to get pseudo-labeled samples and trains a textual entailment model that can make inferences for both seen and unseen types.
arXiv Detail & Related papers (2024-01-23T22:36:03Z) - Discovering Significant Topics from Legal Decisions with Selective
Inference [0.0]
We propose and evaluate an automated pipeline for discovering significant topics from legal decision texts.
The method identifies case topics significantly correlated with outcomes, topic-word distributions and case-topic weights.
We show that topics derived by the pipeline are consistent with legal doctrines in both areas and can be useful in other related legal analysis tasks.
arXiv Detail & Related papers (2024-01-02T07:00:24Z) - Effective Seed-Guided Topic Discovery by Integrating Multiple Types of
Contexts [28.291684568220827]
We propose an iterative framework, SeedTopicMine, which jointly learns from three types of contexts and fuses their context signals via an ensemble ranking process.
Under various sets of seeds and on multiple datasets, SeedTopicMine consistently yields more coherent and accurate topics than existing seed-guided topic discovery approaches.
arXiv Detail & Related papers (2022-12-12T16:03:38Z) - Topic Taxonomy Expansion via Hierarchy-Aware Topic Phrase Generation [58.3921103230647]
We propose a novel framework for topic taxonomy expansion, named TopicExpan.
TopicExpan directly generates topic-related terms belonging to new topics.
Experimental results on two real-world text corpora show that TopicExpan significantly outperforms other baseline methods in terms of the quality of output.
arXiv Detail & Related papers (2022-10-18T22:38:49Z) - Knowledge-Aware Bayesian Deep Topic Model [50.58975785318575]
We propose a Bayesian generative model for incorporating prior domain knowledge into hierarchical topic modeling.
Our proposed model efficiently integrates the prior knowledge and improves both hierarchical topic discovery and document representation.
arXiv Detail & Related papers (2022-09-20T09:16:05Z) - TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel
Topic Clusters [57.59286394188025]
We propose a novel framework for topic taxonomy completion, named TaxoCom.
TaxoCom discovers novel sub-topic clusters of terms and documents.
Our comprehensive experiments on two real-world datasets demonstrate that TaxoCom not only generates the high-quality topic taxonomy in terms of term coherency and topic coverage.
arXiv Detail & Related papers (2022-01-18T07:07:38Z) - OTTers: One-turn Topic Transitions for Open-Domain Dialogue [11.305029351461306]
Mixed initiative in open-domain dialogue requires a system to pro-actively introduce new topics.
One-turn topic transition task explores how a system connects two topics in a cooperative and coherent manner.
arXiv Detail & Related papers (2021-05-28T10:16:59Z) - Topic-Aware Multi-turn Dialogue Modeling [91.52820664879432]
This paper presents a novel solution for multi-turn dialogue modeling, which segments and extracts topic-aware utterances in an unsupervised way.
Our topic-aware modeling is implemented by a newly proposed unsupervised topic-aware segmentation algorithm and Topic-Aware Dual-attention Matching (TADAM) Network.
arXiv Detail & Related papers (2020-09-26T08:43:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.