From Statistical Methods to Deep Learning, Automatic Keyphrase
Prediction: A Survey
- URL: http://arxiv.org/abs/2305.02579v1
- Date: Thu, 4 May 2023 06:22:50 GMT
- Title: From Statistical Methods to Deep Learning, Automatic Keyphrase
Prediction: A Survey
- Authors: Binbin Xie, Jia Song, Liangying Shao, Suhang Wu, Xiangpeng Wei,
Baosong Yang, Huan Lin, Jun Xie and Jinsong Su
- Abstract summary: Keyphrase prediction aims to generate phrases (keyphrases) that highly summarizes a given document.
Recently, researchers have conducted in-depth studies on this task from various perspectives.
Our work analyzes up to 167 previous works, achieving greater coverage of this task than previous surveys.
- Score: 44.83902003341381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keyphrase prediction aims to generate phrases (keyphrases) that highly
summarizes a given document. Recently, researchers have conducted in-depth
studies on this task from various perspectives. In this paper, we
comprehensively summarize representative studies from the perspectives of
dominant models, datasets and evaluation metrics. Our work analyzes up to 167
previous works, achieving greater coverage of this task than previous surveys.
Particularly, we focus highly on deep learning-based keyphrase prediction,
which attracts increasing attention of this task in recent years. Afterwards,
we conduct several groups of experiments to carefully compare representative
models. To the best of our knowledge, our work is the first attempt to compare
these models using the identical commonly-used datasets and evaluation metric,
facilitating in-depth analyses of their disadvantages and advantages. Finally,
we discuss the possible research directions of this task in the future.
Related papers
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Toward Unified Practices in Trajectory Prediction Research on Drone Datasets [3.1406146587437904]
The availability of high-quality datasets is crucial for the development of behavior prediction algorithms in autonomous vehicles.
This paper highlights the need to standardize the use of certain datasets for motion forecasting research.
We propose a set of tools and practices to achieve this.
arXiv Detail & Related papers (2024-05-01T16:17:39Z) - Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data.
We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z) - Evaluation of Faithfulness Using the Longest Supported Subsequence [52.27522262537075]
We introduce a novel approach to evaluate faithfulness of machine-generated text by computing the longest noncontinuous of the claim that is supported by the context.
Using a new human-annotated dataset, we finetune a model to generate Longest Supported Subsequence (LSS)
Our proposed metric demonstrates an 18% enhancement over the prevailing state-of-the-art metric for faithfulness on our dataset.
arXiv Detail & Related papers (2023-08-23T14:18:44Z) - Robust Visual Question Answering: Datasets, Methods, and Future
Challenges [23.59923999144776]
Visual question answering requires a system to provide an accurate natural language answer given an image and a natural language question.
Previous generic VQA methods often exhibit a tendency to memorize biases present in the training data rather than learning proper behaviors, such as grounding images before predicting answers.
Various datasets and debiasing methods have been proposed to evaluate and enhance the VQA robustness, respectively.
arXiv Detail & Related papers (2023-07-21T10:12:09Z) - Topics in the Haystack: Extracting and Evaluating Topics beyond
Coherence [0.0]
We propose a method that incorporates a deeper understanding of both sentence and document themes.
This allows our model to detect latent topics that may include uncommon words or neologisms.
We present correlation coefficients with human identification of intruder words and achieve near-human level results at the word-intrusion task.
arXiv Detail & Related papers (2023-03-30T12:24:25Z) - Deep Learning Schema-based Event Extraction: Literature Review and
Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot.
This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z) - Quantitative Argument Summarization and Beyond: Cross-Domain Key Point
Analysis [17.875273745811775]
We develop a method for automatic extraction of key points, which enables fully automatic analysis.
We demonstrate that the applicability of key point analysis goes well beyond argumentation data.
An additional contribution is an in-depth evaluation of argument-to-key point matching models.
arXiv Detail & Related papers (2020-10-11T23:01:51Z) - A Survey on Text Classification: From Shallow to Deep Learning [83.47804123133719]
The last decade has seen a surge of research in this area due to the unprecedented success of deep learning.
This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021.
We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification.
arXiv Detail & Related papers (2020-08-02T00:09:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.