An efficient domain-independent approach for supervised keyphrase extraction and ranking
- URL: http://arxiv.org/abs/2404.07954v1
- Date: Sun, 24 Mar 2024 08:33:27 GMT
- Title: An efficient domain-independent approach for supervised keyphrase extraction and ranking
- Authors: Sriraghavendra Ramaswamy,
- Abstract summary: We present a supervised learning approach for automatic extraction of keyphrases from single documents.
Our solution uses simple to compute statistical and positional features of candidate phrases.
Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models.
- Score: 0.03626013617212666
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We present a supervised learning approach for automatic extraction of keyphrases from single documents. Our solution uses simple to compute statistical and positional features of candidate phrases and does not rely on any external knowledge base or on pre-trained language models or word embeddings. The ranking component of our proposed solution is a fairly lightweight ensemble model. Evaluation on benchmark datasets shows that our approach achieves significantly higher accuracy than several state-of-the-art baseline models, including all deep learning-based unsupervised models compared with, and is competitive with some supervised deep learning-based models too. Despite the supervised nature of our solution, the fact that does not rely on any corpus of "golden" keywords or any external knowledge corpus means that our solution bears the advantages of unsupervised solutions to a fair extent.
Related papers
- Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a time series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.
We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.
Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - The Battleship Approach to the Low Resource Entity Matching Problem [0.0]
We propose a new active learning approach for entity matching problems.
We focus on a selection mechanism that exploits unique properties of entity matching.
An experimental analysis shows that the proposed algorithm outperforms state-of-the-art active learning solutions to low resource entity matching.
arXiv Detail & Related papers (2023-11-27T10:18:17Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - Language Models in the Loop: Incorporating Prompting into Weak
Supervision [11.10422546502386]
We propose a new strategy for applying large pre-trained language models to novel tasks when labeled training data is limited.
Instead of applying the model in a typical zero-shot or few-shot fashion, we treat the model as the basis for labeling functions in a weak supervision framework.
arXiv Detail & Related papers (2022-05-04T20:42:40Z) - Leveraging Ensembles and Self-Supervised Learning for Fully-Unsupervised
Person Re-Identification and Text Authorship Attribution [77.85461690214551]
Learning from fully-unlabeled data is challenging in Multimedia Forensics problems, such as Person Re-Identification and Text Authorship Attribution.
Recent self-supervised learning methods have shown to be effective when dealing with fully-unlabeled data in cases where the underlying classes have significant semantic differences.
We propose a strategy to tackle Person Re-Identification and Text Authorship Attribution by enabling learning from unlabeled data even when samples from different classes are not prominently diverse.
arXiv Detail & Related papers (2022-02-07T13:08:11Z) - A comprehensive solution to retrieval-based chatbot construction [4.807955518532493]
We present an end-to-end set of solutions to take the reader from an unlabelled chatlogs to a deployed chatbots.
This set of solutions includes creating a self-supervised dataset and a weakly labelled dataset from chatlogs, as well as a systematic approach to selecting a fixed list of canned responses.
We find that using a self-supervised contrastive learning model outperforms training the binary and multi-class classification models on a weakly labelled dataset.
arXiv Detail & Related papers (2021-06-11T02:54:33Z) - Unsupervised Learning for Robust Fitting:A Reinforcement Learning
Approach [25.851792661168698]
We introduce a novel framework that learns to solve robust model fitting.
Unlike other methods, our work is agnostic to the underlying input features.
We empirically show that our method outperforms existing learning approaches.
arXiv Detail & Related papers (2021-03-05T07:14:00Z) - Syntactic and Semantic-driven Learning for Open Information Extraction [42.65591370263333]
One of the biggest bottlenecks in building accurate, high coverage neural open IE systems is the need for large labelled corpora.
We propose a syntactic and semantic-driven learning approach, which can learn neural open IE models without any human-labelled data.
arXiv Detail & Related papers (2021-03-05T02:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.