FMMD: A multimodal open peer review dataset based on F1000Research
- URL: http://arxiv.org/abs/2602.14285v1
- Date: Sun, 15 Feb 2026 19:36:05 GMT
- Title: FMMD: A multimodal open peer review dataset based on F1000Research
- Authors: Zhenzhen Zhuang, Yuqing Fu, Jing Zhu, Zhangping Zhou, Jialiang Lin,
- Abstract summary: FMMD is a multimodal and multidisciplinary open peer review dataset curated from F1000Research.<n>It bridges the current gap by integrating manuscript-level visual and structural data with version-specific reviewer reports and editorial decisions.<n>It provides a comprehensive empirical resource for the development of peer review research.
- Score: 2.375184015411392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated scholarly paper review (ASPR) has entered the coexistence phase with traditional peer review, where artificial intelligence (AI) systems are increasingly incorporated into real-world manuscript evaluation. In parallel, research on automated and AI-assisted peer review has proliferated. Despite this momentum, empirical progress remains constrained by several critical limitations in existing datasets. While reviewers routinely evaluate figures, tables, and complex layouts to assess scientific claims, most existing datasets remain overwhelmingly text-centric. This bias is reinforced by a narrow focus on data from computer science venues. Furthermore, these datasets lack precise alignment between reviewer comments and specific manuscript versions, obscuring the iterative relationship between peer review and manuscript evolution. In response, we introduce FMMD, a multimodal and multidisciplinary open peer review dataset curated from F1000Research. The dataset bridges the current gap by integrating manuscript-level visual and structural data with version-specific reviewer reports and editorial decisions. By providing explicit alignment between reviewer comments and the exact article iteration under review, FMMD enables fine-grained analysis of the peer review lifecycle across diverse scientific domains. FMMD supports tasks such as multimodal issue detection and multimodal review comment generation. It provides a comprehensive empirical resource for the development of peer review research.
Related papers
- Is Peer Review Really in Decline? Analyzing Review Quality across Venues and Time [55.756345497678204]
We introduce a new framework for evidence-based comparative study of review quality.<n>We apply it to major AI and machine learning conferences: ICLR, NeurIPS and *ACL.<n>We study the relationships between measurements of review quality, and its evolution over time.
arXiv Detail & Related papers (2026-01-21T16:48:29Z) - Paper Copilot: Tracking the Evolution of Peer Review in AI Conferences [10.900405397994687]
We present Paper Copilot, a system that creates durable digital archives of peer reviews across a wide range of computer-science venues.<n>By releasing both the infrastructure and the dataset, Paper Copilot supports reproducible research on the evolution of peer review.
arXiv Detail & Related papers (2025-10-15T06:41:06Z) - LiRA: A Multi-Agent Framework for Reliable and Readable Literature Review Generation [66.09346158850308]
We present LiRA (Literature Review Agents), a multi-agent collaborative workflow which emulates the human literature review process.<n>LiRA utilizes specialized agents for content outlining, subsection writing, editing, and reviewing, producing cohesive and comprehensive review articles.<n>We evaluate LiRA in real-world scenarios using document retrieval and assess its robustness to reviewer model variation.
arXiv Detail & Related papers (2025-10-01T12:14:28Z) - What Drives Paper Acceptance? A Process-Centric Analysis of Modern Peer Review [2.9282248958475345]
We present a large-scale empirical study of ICLR 2017-2025, encompassing over 28,000 submissions.<n>Our results show that factors beyond scientific novelty significantly shape acceptance outcomes.<n>We propose data-driven guidelines for authors, reviewers, and meta-reviewers to enhance transparency and fairness in peer review.
arXiv Detail & Related papers (2025-09-30T03:00:10Z) - Re$^2$: A Consistency-ensured Dataset for Full-stage Peer Review and Multi-turn Rebuttal Discussions [2.5226834810382113]
We introduce the largest consistency-ensured peer review and rebuttal dataset named Re2.<n>This dataset comprises 19,926 initial submissions, 70,668 review comments, and 53,818 rebuttals from 24 conferences and 21 workshops on OpenReview.
arXiv Detail & Related papers (2025-05-12T16:02:52Z) - Identifying Aspects in Peer Reviews [59.02879434536289]
We develop a data-driven schema for deriving aspects from a corpus of peer reviews.<n>We introduce a dataset of peer reviews augmented with aspects and show how it can be used for community-level review analysis.
arXiv Detail & Related papers (2025-04-09T14:14:42Z) - Peer Review as A Multi-Turn and Long-Context Dialogue with Role-Based Interactions [62.0123588983514]
Large Language Models (LLMs) have demonstrated wide-ranging applications across various fields.
We reformulate the peer-review process as a multi-turn, long-context dialogue, incorporating distinct roles for authors, reviewers, and decision makers.
We construct a comprehensive dataset containing over 26,841 papers with 92,017 reviews collected from multiple sources.
arXiv Detail & Related papers (2024-06-09T08:24:17Z) - A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence [51.26815896167173]
We present a comprehensive tertiary analysis of PAMI reviews along three complementary dimensions.<n>Our analyses reveal distinctive organizational patterns as well as persistent gaps in current review practices.<n>Finally, our evaluation of state-of-the-art AI-generated reviews indicates encouraging advances in coherence and organization.
arXiv Detail & Related papers (2024-02-20T11:28:50Z) - SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap.
We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections.
This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z) - MOPRD: A multidisciplinary open peer review dataset [12.808751859133064]
Open peer review is a growing trend in academic publications.
Most of the existing peer review datasets do not provide data that cover the whole peer review process.
We construct MOPRD, a multidisciplinary open peer review dataset.
arXiv Detail & Related papers (2022-12-09T16:35:14Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.