PaperBanana: Automating Academic Illustration for AI Scientists
- URL: http://arxiv.org/abs/2601.23265v1
- Date: Fri, 30 Jan 2026 18:33:37 GMT
- Title: PaperBanana: Automating Academic Illustration for AI Scientists
- Authors: Dawei Zhu, Rui Meng, Yale Song, Xiyu Wei, Sujian Li, Tomas Pfister, Jinsung Yoon,
- Abstract summary: PaperBanana is an agentic framework for automated generation of publication-ready academic illustrations.<n>Powered by state-of-the-art VLMs and image generation models, PaperBanana orchestrates specialized agents to retrieve references, plan content and style, render images, and iteratively refine via self-critique.
- Score: 58.120067704652314
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck in the research workflow. To lift this burden, we introduce PaperBanana, an agentic framework for automated generation of publication-ready academic illustrations. Powered by state-of-the-art VLMs and image generation models, PaperBanana orchestrates specialized agents to retrieve references, plan content and style, render images, and iteratively refine via self-critique. To rigorously evaluate our framework, we introduce PaperBananaBench, comprising 292 test cases for methodology diagrams curated from NeurIPS 2025 publications, covering diverse research domains and illustration styles. Comprehensive experiments demonstrate that PaperBanana consistently outperforms leading baselines in faithfulness, conciseness, readability, and aesthetics. We further show that our method effectively extends to the generation of high-quality statistical plots. Collectively, PaperBanana paves the way for the automated generation of publication-ready illustrations.
Related papers
- Self-Evaluation Unlocks Any-Step Text-to-Image Generation [65.7088507945307]
We introduce the Self-Evaluating Model (Self-E), a novel, from-scratch training approach for text-to-image generation.<n>Self-E learns from data similarly to a Flow Matching model, while simultaneously employing a novel self-evaluation mechanism.<n>Experiments on large-scale text-to-image benchmarks show that Self-E not only excels in few-step generation, but is also competitive with state-of-the-art Flow Matching models at 50 steps.
arXiv Detail & Related papers (2025-12-26T20:42:11Z) - NoveltyRank: Estimating Conceptual Novelty of AI Papers [8.218640708170119]
This project aims to develop a model that estimates and ranks conceptual novelty of AI papers.<n>Our approach evaluates novelty primarily through a paper's title, abstract, and semantic similarity to prior literature.<n>We fine-tune Qwen3-4B-Instruct-2507 and SciBERT on both tasks, benchmarking against GPT-5.1 to analyze how task formulation and modeling choices affect performance.
arXiv Detail & Related papers (2025-12-12T03:33:32Z) - Training Data Attribution for Image Generation using Ontology-Aligned Knowledge Graphs [3.686386213696443]
We introduce a framework for interpreting generative outputs through the automatic construction of knowledge graphs.<n>Our method extracts structured triples from images, aligned with a domain-specific ontology.<n>By comparing the KGs of generated and training images, we can trace potential influences, enabling copyright analysis, dataset transparency, and interpretable AI.
arXiv Detail & Related papers (2025-12-02T12:45:20Z) - KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models [88.58758610679762]
We introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens.<n>We categorize editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural.<n>To support fine-grained evaluation, we propose a protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies.
arXiv Detail & Related papers (2025-05-22T14:08:59Z) - Advancing AI Research Assistants with Expert-Involved Learning [84.30323604785646]
Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear.<n>We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework.<n>We find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning.
arXiv Detail & Related papers (2025-05-03T14:21:48Z) - Mixture of Knowledge Minigraph Agents for Literature Review Generation [22.80918934436901]
This paper proposes a novel framework, collaborative knowledge minigraph agents (CKMAs) to automate scholarly literature reviews.<n>A novel prompt-based algorithm, the knowledge minigraph construction agent (KMCA), is designed to identify relations between concepts from academic literature and automatically constructs knowledge minigraphs.<n>By leveraging the capabilities of large language models on constructed knowledge minigraphs, the multiple path summarization agent (MPSA) efficiently organizes concepts and relations from different viewpoints to generate literature review paragraphs.
arXiv Detail & Related papers (2024-11-09T12:06:40Z) - Generative AI in Evidence-Based Software Engineering: A White Paper [10.489725182789885]
In less than a year practitioners and researchers witnessed a rapid and wide implementation of Generative Artificial Intelligence.
Textual GAIs capabilities enable researchers worldwide to explore new generative scenarios simplifying and hastening all timeconsuming text generation and analysis tasks.
Based on our current investigation we will follow up the vision with the creation and empirical validation of a comprehensive suite of models to effectively support EBSE researchers.
arXiv Detail & Related papers (2024-07-24T17:16:17Z) - Automatic Geo-alignment of Artwork in Children's Story Books [0.0]
The project aligns with the company's vision by leveraging the generalisation and scalability of Machine Learning algorithms.
The presented approach can also be adapted to Video and 3D sculpture generation for novel illustrations in digital webbooks.
arXiv Detail & Related papers (2023-03-16T06:23:06Z) - Neural Language Modeling for Contextualized Temporal Graph Generation [49.21890450444187]
This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document.
arXiv Detail & Related papers (2020-10-20T07:08:00Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.