Unsupervised Keyphrase Extraction via Interpretable Neural Networks
- URL: http://arxiv.org/abs/2203.07640v1
- Date: Tue, 15 Mar 2022 04:30:47 GMT
- Title: Unsupervised Keyphrase Extraction via Interpretable Neural Networks
- Authors: Rishabh Joshi and Vidhisha Balachandran and Emily Saldanha and Maria
Glenski and Svitlana Volkova and Yulia Tsvetkov
- Abstract summary: Keyphrases that are most useful for predicting the topic of a text are important keyphrases.
InSPECT is a self-explaining neural framework for identifying influential keyphrases.
We show that INSPECT achieves state-of-the-art results in unsupervised key extraction across four diverse datasets.
- Score: 27.774524511005172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keyphrase extraction aims at automatically extracting a list of "important"
phrases which represent the key concepts in a document. Prior approaches for
unsupervised keyphrase extraction resort to heuristic notions of phrase
importance via embedding similarities or graph centrality, requiring extensive
domain expertise to develop them. Our work proposes an alternative operational
definition: phrases that are most useful for predicting the topic of a text are
important keyphrases. To this end, we propose INSPECT -- a self-explaining
neural framework for identifying influential keyphrases by measuring the
predictive impact of input phrases on the downstream task of topic
classification. We show that this novel approach not only alleviates the need
for ad-hoc heuristics but also achieves state-of-the-art results in
unsupervised keyphrase extraction across four diverse datasets in two domains:
scientific publications and news articles. Ultimately, our study suggests a new
usage of interpretable neural networks as an intrinsic component in NLP
systems, and not only as a tool for explaining model predictions to humans.
Related papers
- MetaKP: On-Demand Keyphrase Generation [52.48698290354449]
We introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents.
We present MetaKP, a large-scale benchmark comprising four datasets, 7500 documents, and 3760 goals across news and biomedical domains with human-annotated keyphrases.
We demonstrate the potential of our method to serve as a general NLP infrastructure, exemplified by its application in epidemic event detection from social media.
arXiv Detail & Related papers (2024-06-28T19:02:59Z) - SimCKP: Simple Contrastive Learning of Keyphrase Representations [36.88517357720033]
We propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; and 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document.
arXiv Detail & Related papers (2023-10-12T11:11:54Z) - Improving Keyphrase Extraction with Data Augmentation and Information
Filtering [67.43025048639333]
Keyphrase extraction is one of the essential tasks for document understanding in NLP.
We present a novel corpus and method for keyphrase extraction from the videos streamed on the Behance platform.
arXiv Detail & Related papers (2022-09-11T22:38:02Z) - Representation Learning for Resource-Constrained Keyphrase Generation [78.02577815973764]
We introduce salient span recovery and salient span prediction as guided denoising language modeling objectives.
We show the effectiveness of the proposed approach for low-resource and zero-shot keyphrase generation.
arXiv Detail & Related papers (2022-03-15T17:48:04Z) - Deep Keyphrase Completion [59.0413813332449]
Keyphrase provides accurate information of document content that is highly compact, concise, full of meanings, and widely used for discourse comprehension, organization, and text retrieval.
We propose textitkeyphrase completion (KPC) to generate more keyphrases for document (e.g. scientific publication) taking advantage of document content along with a very limited number of known keyphrases.
We name it textitdeep keyphrase completion (DKPC) since it attempts to capture the deep semantic meaning of the document content together with known keyphrases via a deep learning framework
arXiv Detail & Related papers (2021-10-29T07:15:35Z) - UniKeyphrase: A Unified Extraction and Generation Framework for
Keyphrase Prediction [20.26899340581431]
Keyphrase Prediction task aims at predicting several keyphrases that can summarize the main idea of the given document.
Mainstream KP methods can be categorized into purely generative approaches and integrated models with extraction and generation.
We propose UniKeyphrase, a novel end-to-end learning framework that jointly learns to extract and generate keyphrases.
arXiv Detail & Related papers (2021-06-09T07:09:51Z) - Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness [9.13755431537592]
We discuss the usefulness of absent keyphrases from an Information Retrieval perspective.
We introduce a finer-grained categorization scheme that sheds more light on the impact of absent keyphrases on scientific document retrieval.
arXiv Detail & Related papers (2021-03-23T10:42:18Z) - Be More with Less: Hypergraph Attention Networks for Inductive Text
Classification [56.98218530073927]
Graph neural networks (GNNs) have received increasing attention in the research community and demonstrated their promising results on this canonical task.
Despite the success, their performance could be largely jeopardized in practice since they are unable to capture high-order interaction between words.
We propose a principled model -- hypergraph attention networks (HyperGAT) which can obtain more expressive power with less computational consumption for text representation learning.
arXiv Detail & Related papers (2020-11-01T00:21:59Z) - Select, Extract and Generate: Neural Keyphrase Generation with
Layer-wise Coverage Attention [75.44523978180317]
We propose emphSEG-Net, a neural keyphrase generation model that is composed of two major components.
The experimental results on seven keyphrase generation benchmarks from scientific and web documents demonstrate that SEG-Net outperforms the state-of-the-art neural generative methods by a large margin.
arXiv Detail & Related papers (2020-08-04T18:00:07Z) - Corpus-level and Concept-based Explanations for Interpretable Document
Classification [23.194220621342254]
We propose a corpus-level explanation approach to capture causal relationships between keywords and model predictions.
We also propose a concept-based explanation method that can automatically learn higher-level concepts and their importance to model prediction tasks.
arXiv Detail & Related papers (2020-04-24T20:54:17Z) - Keyphrase Extraction with Span-based Feature Representations [13.790461555410747]
Keyphrases are capable of providing semantic metadata characterizing documents.
Three approaches to address keyphrase extraction: (i) traditional two-step ranking method, (ii) sequence labeling and (iii) generation using neural networks.
In this paper, we propose a novelty Span Keyphrase Extraction model that extracts span-based feature representation of keyphrase directly from all the content tokens.
arXiv Detail & Related papers (2020-02-13T09:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.