Improving Performance of Automatic Keyword Extraction (AKE) Methods
Using PoS-Tagging and Enhanced Semantic-Awareness
- URL: http://arxiv.org/abs/2211.05031v1
- Date: Wed, 9 Nov 2022 17:04:13 GMT
- Title: Improving Performance of Automatic Keyword Extraction (AKE) Methods
Using PoS-Tagging and Enhanced Semantic-Awareness
- Authors: Enes Altuncu, Jason R.C. Nurse, Yang Xu, Jie Guo, Shujun Li
- Abstract summary: This paper proposes a simple but effective post-processing-based universal approach to improve the performance of any AKE methods.
We consider word types retrieved from a PoS-tagging step and two representative sources of semantic information.
For five state-of-the-art (SOTA) AKE methods, our experimental results with 17 selected datasets showed that the proposed approach improved their performances both consistently (up to 100% in terms of improved cases) and significantly (between 10.2% and 53.8%, with an average of 25.8%, in terms of F1-score and across all
- Score: 8.823779489420772
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic keyword extraction (AKE) has gained more importance with the
increasing amount of digital textual data that modern computing systems
process. It has various applications in information retrieval (IR) and natural
language processing (NLP), including text summarisation, topic analysis and
document indexing. This paper proposes a simple but effective
post-processing-based universal approach to improve the performance of any AKE
methods, via an enhanced level of semantic-awareness supported by PoS-tagging.
To demonstrate the performance of the proposed approach, we considered word
types retrieved from a PoS-tagging step and two representative sources of
semantic information -- specialised terms defined in one or more
context-dependent thesauri, and named entities in Wikipedia. The above three
steps can be simply added to the end of any AKE methods as part of a
post-processor, which simply re-evaluate all candidate keywords following some
context-specific and semantic-aware criteria. For five state-of-the-art (SOTA)
AKE methods, our experimental results with 17 selected datasets showed that the
proposed approach improved their performances both consistently (up to 100\% in
terms of improved cases) and significantly (between 10.2\% and 53.8\%, with an
average of 25.8\%, in terms of F1-score and across all five methods),
especially when all the three enhancement steps are used. Our results have
profound implications considering the ease to apply our proposed approach to
any AKE methods and to further extend it.
Related papers
- MetaKP: On-Demand Keyphrase Generation [52.48698290354449]
We introduce on-demand keyphrase generation, a novel paradigm that requires keyphrases that conform to specific high-level goals or intents.
We present MetaKP, a large-scale benchmark comprising four datasets, 7500 documents, and 3760 goals across news and biomedical domains with human-annotated keyphrases.
We demonstrate the potential of our method to serve as a general NLP infrastructure, exemplified by its application in epidemic event detection from social media.
arXiv Detail & Related papers (2024-06-28T19:02:59Z) - Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Generalized Correspondence Matching via Flexible Hierarchical Refinement
and Patch Descriptor Distillation [13.802788788420175]
Correspondence matching plays a crucial role in numerous robotics applications.
This paper addresses the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach.
Our proposed method achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively.
arXiv Detail & Related papers (2024-03-08T15:32:18Z) - ACID: Abstractive, Content-Based IDs for Document Retrieval with
Language Models [69.86170930261841]
We introduce ACID, in which each document's ID is composed of abstractive keyphrases generated by a large language model.
We show that using ACID improves top-10 and top-20 accuracy by 15.6% and 14.4% (relative)
Our results demonstrate the effectiveness of human-readable, natural-language IDs in generative retrieval with LMs.
arXiv Detail & Related papers (2023-11-14T23:28:36Z) - End-to-End Autoregressive Retrieval via Bootstrapping for Smart Reply
Systems [7.2949782290577945]
We consider a novel approach that learns the smart reply task end-to-end from a dataset of (message, reply set) pairs obtained via bootstrapping.
Empirical results show this method consistently outperforms a range of state-of-the-art baselines across three datasets.
arXiv Detail & Related papers (2023-10-29T09:56:17Z) - Information Extraction in Domain and Generic Documents: Findings from
Heuristic-based and Data-driven Approaches [0.0]
Information extraction plays important role in natural language processing.
Document genre and length influence on IE tasks.
No single method demonstrated overwhelming performance in both tasks.
arXiv Detail & Related papers (2023-06-30T20:43:27Z) - Selective In-Context Data Augmentation for Intent Detection using
Pointwise V-Information [100.03188187735624]
We introduce a novel approach based on PLMs and pointwise V-information (PVI), a metric that can measure the usefulness of a datapoint for training a model.
Our method first fine-tunes a PLM on a small seed of training data and then synthesizes new datapoints - utterances that correspond to given intents.
Our method is thus able to leverage the expressive power of large language models to produce diverse training data.
arXiv Detail & Related papers (2023-02-10T07:37:49Z) - Supporting Vision-Language Model Inference with Confounder-pruning Knowledge Prompt [71.77504700496004]
Vision-language models are pre-trained by aligning image-text pairs in a common space to deal with open-set visual concepts.
To boost the transferability of the pre-trained models, recent works adopt fixed or learnable prompts.
However, how and what prompts can improve inference performance remains unclear.
arXiv Detail & Related papers (2022-05-23T07:51:15Z) - Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings [81.09026586111811]
We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting.
This is achieved by replacing each class label with a vector-valued embedding of a short paragraph that describes the class.
The resulting merged semantic segmentation dataset of over 2 Million images enables training a model that achieves performance equal to that of state-of-the-art supervised methods on 7 benchmark datasets.
arXiv Detail & Related papers (2022-02-04T07:19:09Z) - Systematic Investigation of Strategies Tailored for Low-Resource
Settings for Sanskrit Dependency Parsing [14.416855042499945]
Existing state of the art approaches for Sanskrit Dependency Parsing (SDP) are hybrid in nature.
purely data-driven approaches do not match the performance of hybrid approaches due to labelled data sparsity.
We experiment with five strategies, namely, data augmentation, sequential transfer learning, cross-lingual/mono-lingual pretraining, multi-task learning and self-training.
Our proposed ensembled system outperforms the purely data-driven state of the art system by 2.8/3.9 points (Unlabelled Attachment Score (UAS)/Labelled Attachment Score (LAS)) absolute gain
arXiv Detail & Related papers (2022-01-27T08:24:53Z) - Benchmark Performance of Machine And Deep Learning Based Methodologies
for Urdu Text Document Classification [4.1353427192071015]
This paper provides benchmark performance for Urdu text document classification.
It investigates the performance impact of traditional machine learning based Urdu text document classification methodologies.
For the very first time, it as-sesses the performance of various deep learning based methodologies for Urdu text document classification.
arXiv Detail & Related papers (2020-03-03T05:49:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.