Hierarchical Catalogue Generation for Literature Review: A Benchmark
- URL: http://arxiv.org/abs/2304.03512v3
- Date: Fri, 17 Nov 2023 02:08:14 GMT
- Title: Hierarchical Catalogue Generation for Literature Review: A Benchmark
- Authors: Kun Zhu, Xiaocheng Feng, Xiachong Feng, Yingsheng Wu and Bing Qin
- Abstract summary: We construct a novel English Hierarchical Catalogues of Literature Reviews dataset with 7.6k literature review catalogues and 389k reference papers.
To accurately assess the model performance, we design two evaluation metrics for informativeness and similarity to ground truth from semantics and structure.
- Score: 36.22298354302282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific literature review generation aims to extract and organize
important information from an abundant collection of reference papers and
produces corresponding reviews while lacking a clear and logical hierarchy. We
observe that a high-quality catalogue-guided generation process can effectively
alleviate this problem. Therefore, we present an atomic and challenging task
named Hierarchical Catalogue Generation for Literature Review as the first step
for review generation, which aims to produce a hierarchical catalogue of a
review paper given various references. We construct a novel English
Hierarchical Catalogues of Literature Reviews Dataset with 7.6k literature
review catalogues and 389k reference papers. To accurately assess the model
performance, we design two evaluation metrics for informativeness and
similarity to ground truth from semantics and structure.Our extensive analyses
verify the high quality of our dataset and the effectiveness of our evaluation
metrics. We further benchmark diverse experiments on state-of-the-art
summarization models like BART and large language models like ChatGPT to
evaluate their capabilities. We further discuss potential directions for this
task to motivate future research.
Related papers
- Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored.
We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches.
We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z) - HiReview: Hierarchical Taxonomy-Driven Automatic Literature Review Generation [15.188580557890942]
HiReview is a novel framework for hierarchical taxonomy-driven automatic literature review generation.
Extensive experiments demonstrate that HiReview significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-10-02T13:02:03Z) - What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation [57.550045763103334]
evaluating a story can be more challenging than other generation evaluation tasks.
We first summarize existing storytelling tasks, including text-to-text, visual-to-text, and text-to-visual.
We propose a taxonomy to organize evaluation metrics that have been developed or can be adopted for story evaluation.
arXiv Detail & Related papers (2024-08-26T20:35:42Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Knowledge-Centric Templatic Views of Documents [2.654058995940072]
Authors often share their ideas in various document formats, such as slide decks, newsletters, reports, and posters.
We introduce a novel unified evaluation framework that can be adapted to measuring the quality of document generators.
We conduct a human evaluation, which shows that people prefer 82% of the documents generated with our method.
arXiv Detail & Related papers (2024-01-13T01:22:15Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Enhancing Identification of Structure Function of Academic Articles
Using Contextual Information [6.28532577139029]
This paper takes articles of the ACL conference as the corpus to identify the structure function of academic articles.
We employ the traditional machine learning models and deep learning models to construct the classifiers based on various feature input.
Inspired by (2), this paper introduces contextual information into the deep learning models and achieved significant results.
arXiv Detail & Related papers (2021-11-28T11:21:21Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.