Related papers: BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text

BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text

URL: http://arxiv.org/abs/2505.18207v1
Date: Thu, 22 May 2025 06:04:02 GMT
Title: BAGELS: Benchmarking the Automated Generation and Extraction of Limitations from Scholarly Text
Authors: Ibrahim Al Azher, Miftahul Jannat Mokarrama, Zhishuai Guo, Sagnik Ray Choudhury, Hamed Alhoori,
Abstract summary: In scientific research, limitations refer to the shortcomings, constraints, or weaknesses within a study.<n>Authors often a) underreport them in the paper text and b) use hedging strategies to satisfy editorial requirements.<n>This underreporting behavior, along with an explosion in the number of publications, has created a pressing need to automatically extract or generate such limitations.
Score: 6.682911432177815
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In scientific research, limitations refer to the shortcomings, constraints, or weaknesses within a study. Transparent reporting of such limitations can enhance the quality and reproducibility of research and improve public trust in science. However, authors often a) underreport them in the paper text and b) use hedging strategies to satisfy editorial requirements at the cost of readers' clarity and confidence. This underreporting behavior, along with an explosion in the number of publications, has created a pressing need to automatically extract or generate such limitations from scholarly papers. In this direction, we present a complete architecture for the computational analysis of research limitations. Specifically, we create a dataset of limitations in ACL, NeurIPS, and PeerJ papers by extracting them from papers' text and integrating them with external reviews; we propose methods to automatically generate them using a novel Retrieval Augmented Generation (RAG) technique; we create a fine-grained evaluation framework for generated limitations; and we provide a meta-evaluation for the proposed evaluation techniques.

Related papers

Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers [31.51311612333459]
LimitGen is the first benchmark for evaluating LLMs' capability to support early-stage feedback and complement human peer review.<n>Our approach enhances the capabilities of LLM systems to generate limitations in research papers, enabling them to provide more concrete and constructive feedback.
arXiv Detail & Related papers (2025-07-03T15:04:38Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z)
Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation [0.0]
Retrieval-augmented generation (RAG) has become a transformative approach for enhancing large language models (LLMs)<n>This study presents a rigorous analysis of late chunking and contextual retrieval, evaluating their effectiveness and efficiency in optimizing RAG systems.<n>Our results indicate that contextual retrieval preserves semantic coherence more effectively but requires greater computational resources.
arXiv Detail & Related papers (2025-04-28T12:52:05Z)
WritingBench: A Comprehensive Benchmark for Generative Writing [87.48445972563631]
We present WritingBench, a benchmark designed to evaluate large language models (LLMs) across 6 core writing domains and 100, encompassing creative, persuasive, informative, and technical writing.<n>We propose a query-dependent evaluation framework that empowers LLMs to dynamically generate instance-specific assessment criteria.<n>This framework is complemented by a fine-tuned critic model for criteria-aware scoring, enabling evaluations in style, format and length.
arXiv Detail & Related papers (2025-03-07T08:56:20Z)
Is Your Paper Being Reviewed by an LLM? Benchmarking AI Text Detection in Peer Review [6.20631177269082]
A new risk to the peer review process is that negligent reviewers will rely on large language models (LLMs) to review a paper.<n>We introduce a comprehensive dataset containing a total of 788,984 AI-written peer reviews paired with corresponding human reviews.<n>We use this new resource to evaluate the ability of 18 existing AI text detection algorithms to distinguish between peer reviews fully written by humans and different state-of-the-art LLMs.
arXiv Detail & Related papers (2025-02-26T23:04:05Z)
Are We There Yet? Revealing the Risks of Utilizing Large Language Models in Scholarly Peer Review [66.73247554182376]
Large language models (LLMs) have led to their integration into peer review.<n>The unchecked adoption of LLMs poses significant risks to the integrity of the peer review system.<n>We show that manipulating 5% of the reviews could potentially cause 12% of the papers to lose their position in the top 30% rankings.
arXiv Detail & Related papers (2024-12-02T16:55:03Z)
Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research [2.1728621449144763]
Bibliometric analysis is essential for understanding research trends, scope, and impact in urban science. Traditional methods, relying on keyword searches, often fail to uncover valuable insights not explicitly stated in article titles or keywords. We leverage Generative AI models, specifically transformers and Retrieval-Augmented Generation (RAG), to automate and enhance bibliometric analysis.
arXiv Detail & Related papers (2024-10-08T05:13:27Z)
Automated Review Generation Method Based on Large Language Models [8.86304208754684]
We present an automated review generation method based on large language models (LLMs)<n>Our method swiftly analyzed 343 articles, averaging seconds per article per LLM account, producing comprehensive reviews spanning 35 topics, with extended analysis of 1041 articles.
arXiv Detail & Related papers (2024-07-30T15:26:36Z)
LimGen: Probing the LLMs for Generating Suggestive Limitations of Research Papers [8.076841611508488]
We present a novel and challenging task of Suggestive Limitation Generation (SLG) for research papers. We compile a dataset called textbftextitLimGen, encompassing 4068 research papers and their associated limitations from the ACL anthology.
arXiv Detail & Related papers (2024-03-22T17:31:43Z)
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models [49.74036826946397]
This study investigates constrained text generation for large language models (LLMs) Our research mainly focuses on mainstream open-source LLMs, categorizing constraints into lexical, structural, and relation-based types. Results illuminate LLMs' capacity and deficiency to incorporate constraints and provide insights for future developments in constrained text generation.
arXiv Detail & Related papers (2023-10-25T03:58:49Z)
GLUECons: A Generic Benchmark for Learning Under Constraints [102.78051169725455]
In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision. We model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints.
arXiv Detail & Related papers (2023-02-16T16:45:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.