Hierarchical Catalogue Generation for Literature Review: A Benchmark
- URL: http://arxiv.org/abs/2304.03512v3
- Date: Fri, 17 Nov 2023 02:08:14 GMT
- Title: Hierarchical Catalogue Generation for Literature Review: A Benchmark
- Authors: Kun Zhu, Xiaocheng Feng, Xiachong Feng, Yingsheng Wu and Bing Qin
- Abstract summary: We construct a novel English Hierarchical Catalogues of Literature Reviews dataset with 7.6k literature review catalogues and 389k reference papers.
To accurately assess the model performance, we design two evaluation metrics for informativeness and similarity to ground truth from semantics and structure.
- Score: 36.22298354302282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific literature review generation aims to extract and organize
important information from an abundant collection of reference papers and
produces corresponding reviews while lacking a clear and logical hierarchy. We
observe that a high-quality catalogue-guided generation process can effectively
alleviate this problem. Therefore, we present an atomic and challenging task
named Hierarchical Catalogue Generation for Literature Review as the first step
for review generation, which aims to produce a hierarchical catalogue of a
review paper given various references. We construct a novel English
Hierarchical Catalogues of Literature Reviews Dataset with 7.6k literature
review catalogues and 389k reference papers. To accurately assess the model
performance, we design two evaluation metrics for informativeness and
similarity to ground truth from semantics and structure.Our extensive analyses
verify the high quality of our dataset and the effectiveness of our evaluation
metrics. We further benchmark diverse experiments on state-of-the-art
summarization models like BART and large language models like ChatGPT to
evaluate their capabilities. We further discuss potential directions for this
task to motivate future research.
Related papers
- CASIMIR: A Corpus of Scientific Articles enhanced with Multiple Author-Integrated Revisions [7.503795054002406]
We propose an original textual resource on the revision step of the writing process of scientific articles.
This new dataset, called CASIMIR, contains the multiple revised versions of 15,646 scientific articles from OpenReview, along with their peer reviews.
arXiv Detail & Related papers (2024-03-01T03:07:32Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [58.6354685593418]
This paper proposes several article-level, field-normalized, and large language model-empowered bibliometric indicators to evaluate reviews.
The newly emerging AI-generated literature reviews are also appraised.
This work offers insights into the current challenges of literature reviews and envisions future directions for their development.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy [27.454549324141087]
We propose a novel VQA benchmark based on well-known visual classification datasets.
We also suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category.
Our contributions aim to lay the foundation for more precise and meaningful assessments.
arXiv Detail & Related papers (2024-02-11T18:26:18Z) - Leveraging Large Language Models for NLG Evaluation: Advances and Challenges [57.88520765782177]
Large Language Models (LLMs) have opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance.
We propose a coherent taxonomy for organizing existing LLM-based evaluation metrics, offering a structured framework to understand and compare these methods.
By discussing unresolved challenges, including bias, robustness, domain-specificity, and unified evaluation, this paper seeks to offer insights to researchers and advocate for fairer and more advanced NLG evaluation techniques.
arXiv Detail & Related papers (2024-01-13T15:59:09Z) - Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language
Modelling [70.23876429382969]
We propose a benchmark that can evaluate intra-sentence discourse properties across a diverse set of NLP tasks.
Disco-Bench consists of 9 document-level testsets in the literature domain, which contain rich discourse phenomena.
For linguistic analysis, we also design a diagnostic test suite that can examine whether the target models learn discourse knowledge.
arXiv Detail & Related papers (2023-07-16T15:18:25Z) - Large Language Models are Diverse Role-Players for Summarization
Evaluation [82.31575622685902]
A document summary's quality can be assessed by human annotators on various criteria, both objective ones like grammar and correctness, and subjective ones like informativeness, succinctness, and appeal.
Most of the automatic evaluation methods like BLUE/ROUGE may be not able to adequately capture the above dimensions.
We propose a new evaluation framework based on LLMs, which provides a comprehensive evaluation framework by comparing generated text and reference text from both objective and subjective aspects.
arXiv Detail & Related papers (2023-03-27T10:40:59Z) - Enhancing Identification of Structure Function of Academic Articles
Using Contextual Information [6.28532577139029]
This paper takes articles of the ACL conference as the corpus to identify the structure function of academic articles.
We employ the traditional machine learning models and deep learning models to construct the classifiers based on various feature input.
Inspired by (2), this paper introduces contextual information into the deep learning models and achieved significant results.
arXiv Detail & Related papers (2021-11-28T11:21:21Z) - SPECTER: Document-level Representation Learning using Citation-informed
Transformers [51.048515757909215]
SPECTER generates document-level embedding of scientific documents based on pretraining a Transformer language model.
We introduce SciDocs, a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction to document classification and recommendation.
arXiv Detail & Related papers (2020-04-15T16:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.