Related papers: SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement

SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement

URL: http://arxiv.org/abs/2409.19242v2
Date: Tue, 15 Oct 2024 22:01:55 GMT
Title: SciDoc2Diagrammer-MAF: Towards Generation of Scientific Diagrams from Documents guided by Multi-Aspect Feedback Refinement
Authors: Ishani Mondal, Zongxia Li, Yufang Hou, Anandhavelu Natarajan, Aparna Garimella, Jordan Boyd-Graber,
Abstract summary: We propose SciDoc2Diagram, a task that extracts relevant information from scientific papers and generates diagrams. We develop a pipeline SciDoc2Diagrammer that generates diagrams based on user intentions using intermediate code generation.
Score: 22.07623299712134
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automating the creation of scientific diagrams from academic papers can significantly streamline the development of tutorials, presentations, and posters, thereby saving time and accelerating the process. Current text-to-image models struggle with generating accurate and visually appealing diagrams from long-context inputs. We propose SciDoc2Diagram, a task that extracts relevant information from scientific papers and generates diagrams, along with a benchmarking dataset, SciDoc2DiagramBench. We develop a multi-step pipeline SciDoc2Diagrammer that generates diagrams based on user intentions using intermediate code generation. We observed that initial diagram drafts were often incomplete or unfaithful to the source, leading us to develop SciDoc2Diagrammer-Multi-Aspect-Feedback (MAF), a refinement strategy that significantly enhances factual correctness and visual appeal and outperforms existing models on both automatic and human judgement.

Related papers

Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation [53.42323544075114]
We propose GraphAnchor, a novel Graph-Anchored Knowledge Indexing approach.<n> Experiments on four multi-hop question answering benchmarks demonstrate the effectiveness of GraphAnchor.
arXiv Detail & Related papers (2026-01-23T05:41:05Z)
DiagramEval: Evaluating LLM-Generated Diagrams via Graphs [25.040934047462112]
We argue that a promising direction is to generate demonstration diagrams directly in textual form as SVGs.<n>We propose DiagramEval, a novel evaluation metric designed to assess the quality of demonstration diagrams generated by large language models.
arXiv Detail & Related papers (2025-10-29T17:56:17Z)
SketchAgent: Generating Structured Diagrams from Hand-Drawn Sketches [54.06877048295693]
We introduce SketchAgent, a system designed to automate the transformation of hand-drawn sketches into structured diagrams.<n>SketchAgent integrates sketch recognition, symbolic reasoning, and iterative validation to produce semantically coherent and structurally accurate diagrams.<n>By streamlining the diagram generation process, SketchAgent holds great promise for applications in design, education, and engineering.
arXiv Detail & Related papers (2025-08-02T07:22:51Z)
Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation [7.501482942867853]
We propose Draw with Thought (DwT), a training-free framework that guides MLLMs to reconstruct diagrams into editable mxGraph XML code. DwT enables interpretable and controllable outputs without model fine-tuning. We release Plot2XML, a benchmark of 247 real-world scientific diagrams with gold-standard XML annotations.
arXiv Detail & Related papers (2025-04-13T08:22:09Z)
Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data [5.752510084651565]
Graphy is an end-to-end platform that automates data modeling, exploration and high-quality report generation. We showcase a pre-scrapped graph of over 50,000 papers -- complete with their references -- demonstrating how Graphy facilitates the literature-survey scenario.
arXiv Detail & Related papers (2025-02-24T06:10:49Z)
GRAG: Graph Retrieval-Augmented Generation [14.98084919101233]
Graph Retrieval-Augmented Generation (GRAG) tackles the fundamental challenges in retrieving textual subgraphs. We propose a novel divide-and-conquer strategy that retrieves the optimal subgraph structure in linear time. Our approach significantly outperforms current state-of-the-art RAG methods.
arXiv Detail & Related papers (2024-05-26T10:11:40Z)
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation [14.511401955827875]
Object detection in documents is a key step to automate the structural elements identification process. We present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image.
arXiv Detail & Related papers (2024-02-17T23:08:32Z)
DocGraphLM: Documental Graph Language Model for Information Extraction [15.649726614383388]
We introduce DocGraphLM, a framework that combines pre-trained language models with graph semantics. To achieve this, we propose 1) a joint encoder architecture to represent documents, and 2) a novel link prediction approach to reconstruct document graphs. Our experiments on three SotA datasets show consistent improvement on IE and QA tasks with the adoption of graph features.
arXiv Detail & Related papers (2024-01-05T14:15:36Z)
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model [73.38800189095173]
This work focuses on strengthening the multi-modal diagram analysis ability of Multimodal LLMs. By parsing Latex source files of high-quality papers, we carefully build a multi-modal diagram understanding dataset M-Paper. M-Paper is the first dataset to support joint comprehension of multiple scientific diagrams, including figures and tables in the format of images or Latex codes.
arXiv Detail & Related papers (2023-11-30T04:43:26Z)
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning [62.51232333352754]
Text-to-image (T2I) generation has seen significant growth over the past few years. Despite this, there has been little work on generating diagrams with T2I models. We present DiagrammerGPT, a novel two-stage text-to-diagram generation framework. We show that our framework produces more accurate diagrams, outperforming existing T2I models.
arXiv Detail & Related papers (2023-10-18T17:37:10Z)
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling [91.07963806829237]
We propose GraphLM, a novel document understanding model that injects layout knowledge into the model. We evaluate our model on various benchmarks, including FUNSD, XFUND and CORD, and achieve state-of-the-art results.
arXiv Detail & Related papers (2023-08-15T13:53:52Z)
Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings. Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework. Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z)
Multi-Document Scientific Summarization from a Knowledge Graph-Centric View [9.579482432715261]
We present KGSum, an MDSS model centred on knowledge graphs during both the encoding and decoding process. Specifically, in the encoding process, two graph-based modules are proposed to incorporate knowledge graph information into paper encoding. In the decoding process, we propose a two-stage decoder by first generating knowledge graph information of summary in the form of descriptive sentences, followed by generating the final summary.
arXiv Detail & Related papers (2022-09-09T14:20:59Z)
Structural Information Preserving for Graph-to-Text Generation [59.00642847499138]
The task of graph-to-text generation aims at producing sentences that preserve the meaning of input graphs. We propose to tackle this problem by leveraging richer training signals that can guide our model for preserving input information. Experiments on two benchmarks for graph-to-text generation show the effectiveness of our approach over a state-of-the-art baseline.
arXiv Detail & Related papers (2021-02-12T20:09:01Z)
Neural Language Modeling for Contextualized Temporal Graph Generation [49.21890450444187]
This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document.
arXiv Detail & Related papers (2020-10-20T07:08:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.