Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data
- URL: http://arxiv.org/abs/2502.16868v1
- Date: Mon, 24 Feb 2025 06:10:49 GMT
- Title: Graphy'our Data: Towards End-to-End Modeling, Exploring and Generating Report from Raw Data
- Authors: Longbin Lai, Changwei Luo, Yunkai Lou, Mingchen Ju, Zhengyi Yang,
- Abstract summary: Graphy is an end-to-end platform that automates data modeling, exploration and high-quality report generation.<n>We showcase a pre-scrapped graph of over 50,000 papers -- complete with their references -- demonstrating how Graphy facilitates the literature-survey scenario.
- Score: 5.752510084651565
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) have recently demonstrated remarkable performance in tasks such as Retrieval-Augmented Generation (RAG) and autonomous AI agent workflows. Yet, when faced with large sets of unstructured documents requiring progressive exploration, analysis, and synthesis, such as conducting literature survey, existing approaches often fall short. We address this challenge -- termed Progressive Document Investigation -- by introducing Graphy, an end-to-end platform that automates data modeling, exploration and high-quality report generation in a user-friendly manner. Graphy comprises an offline Scrapper that transforms raw documents into a structured graph of Fact and Dimension nodes, and an online Surveyor that enables iterative exploration and LLM-driven report generation. We showcase a pre-scrapped graph of over 50,000 papers -- complete with their references -- demonstrating how Graphy facilitates the literature-survey scenario. The demonstration video can be found at https://youtu.be/uM4nzkAdGlM.
Related papers
- Can Large Language Models Analyze Graphs like Professionals? A Benchmark, Datasets and Models [90.98855064914379]
We introduce ProGraph, a benchmark for large language models (LLMs) to process graphs.
Our findings reveal that the performance of current LLMs is unsatisfactory, with the best model achieving only 36% accuracy.
We propose LLM4Graph datasets, which include crawled documents and auto-generated codes based on 6 widely used graph libraries.
arXiv Detail & Related papers (2024-09-29T11:38:45Z) - Cascading Large Language Models for Salient Event Graph Generation [19.731605612333716]
CALLMSAE is a CAscading Large Language Model framework for SAlient Event graph generation.<n>We first identify salient events by prompting LLMs to generate summaries.<n>We then develop an iterative code refinement prompting strategy to generate event relation graphs.<n>Powered by CALLMSAE, we present textitNYT-SEG, a large-scale automatically annotated event graph dataset.
arXiv Detail & Related papers (2024-06-26T15:53:54Z) - GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models [58.08177466768262]
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks.
We introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously.
Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin.
arXiv Detail & Related papers (2024-06-20T17:57:51Z) - DocGraphLM: Documental Graph Language Model for Information Extraction [15.649726614383388]
We introduce DocGraphLM, a framework that combines pre-trained language models with graph semantics.
To achieve this, we propose 1) a joint encoder architecture to represent documents, and 2) a novel link prediction approach to reconstruct document graphs.
Our experiments on three SotA datasets show consistent improvement on IE and QA tasks with the adoption of graph features.
arXiv Detail & Related papers (2024-01-05T14:15:36Z) - Form follows Function: Text-to-Text Conditional Graph Generation based
on Functional Requirements [36.00630198983932]
This work focuses on the novel problem setting of generating graphs conditioned on a description of the graph's functional requirements in a downstream task.
We pose the problem as a text-to-text generation problem and focus on the approach of fine-tuning a pretrained large language model (LLM) to generate graphs.
arXiv Detail & Related papers (2023-11-01T11:12:02Z) - Leveraging Large Language Models for Node Generation in Few-Shot Learning on Text-Attributed Graphs [5.587264586806575]
We propose a plug-and-play approach to empower text-attributed graphs through node generation using Large Language Models (LLMs)<n>LLMs extract semantic information from labels and generate samples that belong to categories as exemplars.<n>We employ an edge predictor to capture structural information inherent in the raw dataset and integrate the newly generated samples into the original graph.
arXiv Detail & Related papers (2023-10-15T16:04:28Z) - Scientific Paper Extractive Summarization Enhanced by Citation Graphs [50.19266650000948]
We focus on leveraging citation graphs to improve scientific paper extractive summarization under different settings.
Preliminary results demonstrate that citation graph is helpful even in a simple unsupervised framework.
Motivated by this, we propose a Graph-based Supervised Summarization model (GSS) to achieve more accurate results on the task when large-scale labeled data are available.
arXiv Detail & Related papers (2022-12-08T11:53:12Z) - Fine-Grained Scene Graph Generation with Data Transfer [127.17675443137064]
Scene graph generation (SGG) aims to extract (subject, predicate, object) triplets in images.
Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding.
We propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a play-and-plug fashion and expanded to large SGG with 1,807 predicate classes.
arXiv Detail & Related papers (2022-03-22T12:26:56Z) - Neural Language Modeling for Contextualized Temporal Graph Generation [49.21890450444187]
This paper presents the first study on using large-scale pre-trained language models for automated generation of an event-level temporal graph for a document.
arXiv Detail & Related papers (2020-10-20T07:08:00Z) - Leveraging Graph to Improve Abstractive Multi-Document Summarization [50.62418656177642]
We develop a neural abstractive multi-document summarization (MDS) model which can leverage well-known graph representations of documents.
Our model utilizes graphs to encode documents in order to capture cross-document relations, which is crucial to summarizing long documents.
Our model can also take advantage of graphs to guide the summary generation process, which is beneficial for generating coherent and concise summaries.
arXiv Detail & Related papers (2020-05-20T13:39:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.