Related papers: SciFig: Towards Automating Scientific Figure Generation

SciFig: Towards Automating Scientific Figure Generation

URL: http://arxiv.org/abs/2601.04390v1
Date: Wed, 07 Jan 2026 20:56:58 GMT
Title: SciFig: Towards Automating Scientific Figure Generation
Authors: Siyuan Huang, Yutong Gao, Juyang Bai, Yifan Zhou, Zi Yin, Xinxin Liu, Rama Chellappa, Chun Pong Lau, Sayan Nag, Cheng Peng, Shraman Pramanick,
Abstract summary: SciFig is an end-to-end AI agent system that generates publication-ready pipeline figures directly from research paper texts.<n>We introduce a rubric-based evaluation framework that analyzes 2,219 real scientific figures to extract evaluation rubrics.<n>SciFig demonstrates remarkable performance: achieving 70.1$%$ overall quality on dataset-level evaluation and 66.2$%$ on paper-specific evaluation.
Score: 41.73701976318102
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Creating high-quality figures and visualizations for scientific papers is a time-consuming task that requires both deep domain knowledge and professional design skills. Despite over 2.5 million scientific papers published annually, the figure generation process remains largely manual. We introduce $\textbf{SciFig}$, an end-to-end AI agent system that generates publication-ready pipeline figures directly from research paper texts. SciFig uses a hierarchical layout generation strategy, which parses research descriptions to identify component relationships, groups related elements into functional modules, and generates inter-module connections to establish visual organization. Furthermore, an iterative chain-of-thought (CoT) feedback mechanism progressively improves layouts through multiple rounds of visual analysis and reasoning. We introduce a rubric-based evaluation framework that analyzes 2,219 real scientific figures to extract evaluation rubrics and automatically generates comprehensive evaluation criteria. SciFig demonstrates remarkable performance: achieving 70.1$\%$ overall quality on dataset-level evaluation and 66.2$\%$ on paper-specific evaluation, and consistently high scores across metrics such as visual clarity, structural organization, and scientific accuracy. SciFig figure generation pipeline and our evaluation benchmark will be open-sourced.

Related papers

DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing [53.85037373860246]
We introduce Deep Synth-Eval, a benchmark designed to objectively evaluate information consolidation capabilities.<n>We propose a fine-grained evaluation protocol using General Checklists (for factual coverage) and Constraint Checklists (for structural organization)<n>Our results demonstrate that agentic plan-and-write significantly outperform single-turn generation.
arXiv Detail & Related papers (2026-01-07T03:07:52Z)
Paper2SysArch: Structure-Constrained System Architecture Generation from Scientific Papers [10.395280181257737]
We introduce a novel benchmark to quantitatively evaluate the automated generation of diagrams from text.<n>It consists of 3,000 research papers paired with their corresponding high-quality ground-truth diagrams and is accompanied by a three-tiered evaluation metric.<n>We propose Paper2Arch, an end-to-end system that leverages multi-agent collaboration to convert papers into structured, editable diagrams.
arXiv Detail & Related papers (2025-11-22T12:24:30Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering [59.54662810933882]
Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models, often lack coherence and granularity.<n>We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering.
arXiv Detail & Related papers (2025-09-23T15:12:58Z)
Meow: End-to-End Outline Writing for Automatic Academic Survey [24.749855249116802]
We propose Meow, a framework that produces organized and faithful outlines efficiently.<n>We first formulate outline writing as an end-to-end task that generates hierarchical structured outlines from paper metadata.<n>We then curate a high-quality dataset of surveys from arXiv, bioRxiv, and medRxiv, and establish systematic evaluation metrics for outline quality assessment.
arXiv Detail & Related papers (2025-09-19T07:20:53Z)
SurGE: A Benchmark and Evaluation Framework for Scientific Survey Generation [37.921524136479825]
SurGE (Survey Generation Evaluation) is a new benchmark for scientific survey generation in computer science.<n>SurGE consists of (1) a collection of test instances, each including a topic description, an expert-written survey, and its full set of cited references, and (2) a large-scale academic corpus of over one million papers.<n>In addition, we propose an automated evaluation framework that measures the quality of generated surveys across four dimensions.
arXiv Detail & Related papers (2025-08-21T15:45:10Z)
HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z)
SciSage: A Multi-Agent Framework for High-Quality Scientific Survey Generation [2.985620880452744]
SciSage is a multi-agent framework employing a reflect-when-you-write paradigm.<n>It critically evaluates drafts at outline, section, and document levels, collaborating with specialized agents for query interpretation, content retrieval, and refinement.<n>We also release SurveyScope, a benchmark of 46 high-impact papers ( 2020-2025) across 11 computer science domains.
arXiv Detail & Related papers (2025-06-15T02:23:47Z)
Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions [62.12545440385489]
Large language models (LLMs) have brought substantial advancements in text generation, but their potential for enhancing classification tasks remains underexplored. We propose a framework for thoroughly investigating fine-tuning LLMs for classification, including both generation- and encoding-based approaches. We instantiate this framework in edit intent classification (EIC), a challenging and underexplored classification task.
arXiv Detail & Related papers (2024-10-02T20:48:28Z)
CORAL: COde RepresentAtion Learning with Weakly-Supervised Transformers for Analyzing Data Analysis [33.190021245507445]
Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process. We propose a novel weakly supervised transformer-based architecture for computing joint representations of code from both abstract syntax trees and surrounding natural language comments. We show that our model, leveraging only easily-available weak supervision, achieves a 38% increase in accuracy over expert-supplieds and outperforms a suite of baselines.
arXiv Detail & Related papers (2020-08-28T19:57:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.