Graph World Model
- URL: http://arxiv.org/abs/2507.10539v1
- Date: Mon, 14 Jul 2025 17:57:45 GMT
- Title: Graph World Model
- Authors: Tao Feng, Yexin Wu, Guanyu Lin, Jiaxuan You,
- Abstract summary: Existing world models (WMs) primarily focus on unstructured data.<n>We propose the Graph World Model (GWM), a world model that supports both unstructured and graph-structured states with multi-modal information.
- Score: 12.75565996016737
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: World models (WMs) demonstrate strong capabilities in prediction, generation, and planning tasks. Existing WMs primarily focus on unstructured data and cannot leverage the ubiquitous structured data, often represented as graphs, in the digital world. While multiple graph foundation models have been proposed, they focus on graph learning tasks and cannot extend to diverse multi-modal data and interdisciplinary tasks. To address these challenges, we propose the Graph World Model (GWM), a world model that supports both unstructured and graph-structured states with multi-modal information and represents diverse tasks as actions. The core of a GWM is a generic message-passing algorithm to aggregate structured information, either over a unified multi-modal token space by converting multi-modal data into text (GWM-T) or a unified multi-modal embedding space by modality-specific encoders (GWM-E). Notably, GWM introduces action nodes to support diverse tasks, where action nodes are linked to other nodes via direct reference or similarity computation. Extensive experiments on six tasks from diverse domains, including multi-modal generation and matching, recommendation, graph prediction, multi-agent, retrieval-augmented generation, and planning and optimization, show that the same GWM outperforms or matches domain-specific baselines' performance, benefits from multi-hop structures, and demonstrates strong zero-shot/few-shot capabilities on unseen new tasks. Our code for GWM is released at https://github.com/ulab-uiuc/GWM.
Related papers
- MMGraphRAG: Bridging Vision and Language with Interpretable Multimodal Knowledge Graphs [6.165053219836395]
We propose MMGraphRAG, which refines visual content through scene graphs and constructs a multimodal knowledge graph.<n>It employs spectral clustering to achieve cross-modal entity linking and retrieves context along reasoning paths to guide the generative process.<n> Experimental results show that MMGraphRAG achieves state-of-the-art performance on the DocBench and MMLongBench datasets.
arXiv Detail & Related papers (2025-07-28T13:16:23Z) - Towards Multi-modal Graph Large Language Model [34.70458746950645]
We explore the potential of Multi-modal Graph Large Language Models (MG-LLM) to unify and generalize across diverse multi-modal graph data and tasks.<n>We present five key desired characteristics for MG-LLM: 1) unified space for multi-modal structures and attributes, 2) capability of handling diverse multi-modal graph tasks, 3) multi-modal graph in-context learning, 4) multi-modal graph interaction with natural language, and 5) multi-modal graph reasoning.
arXiv Detail & Related papers (2025-06-11T13:41:29Z) - MLaGA: Multimodal Large Language and Graph Assistant [9.985787670804823]
Large Language Models (LLMs) have demonstrated substantial efficacy in advancing graph-structured data analysis.<n>We introduce the Multimodal Large Language and Graph Assistant (MLaGA), an innovative model that adeptly extends LLM capabilities to facilitate reasoning over complex graph structures and multimodal attributes.
arXiv Detail & Related papers (2025-06-03T07:52:00Z) - UniGraph2: Learning a Unified Embedding Space to Bind Multimodal Graphs [34.48393396390799]
We propose a novel cross-domain graph foundation model that enables general representation learning on multimodal graphs.<n>UniGraph2 employs modality-specific encoders alongside a graph neural network (GNN) to learn a unified low-dimensional embedding space.<n>We show that UniGraph2 significantly outperforms state-of-the-art models in tasks such as representation learning, transfer learning, and multimodal generative tasks.
arXiv Detail & Related papers (2025-02-02T14:04:53Z) - Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation [59.4356484322228]
Graph Neural Networks (GNNs) have shown promising performance in this domain.<n>We propose GNNs with Modality-Independent Receptive Fields, which employ separate GNNs with independent receptive fields.<n>Our results indicate that the optimal $K$ for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information.
arXiv Detail & Related papers (2024-12-18T16:12:26Z) - Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning [36.75510196380185]
We introduce the Multimodal Graph Benchmark (MM-GRAPH), a pioneering benchmark that incorporates both visual and textual information into graph learning tasks.<n>MM-GRAPH extends beyond existing text-attributed graph benchmarks, offering a more comprehensive evaluation framework for multimodal graph learning.<n>This study offers valuable insights into the challenges and opportunities of integrating visual data into graph learning.
arXiv Detail & Related papers (2024-06-24T05:14:09Z) - Tokenization, Fusion, and Augmentation: Towards Fine-grained Multi-modal Entity Representation [51.80447197290866]
Multi-modal knowledge graph completion (MMKGC) aims to discover unobserved knowledge from given knowledge graphs.<n>Existing MMKGC methods usually extract multi-modal features with pre-trained models.<n>We introduce a novel framework MyGO to tokenize, fuse, and augment the fine-grained multi-modal representations of entities.
arXiv Detail & Related papers (2024-04-15T05:40:41Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Multimodal Graph Learning for Generative Tasks [89.44810441463652]
Multimodal learning combines multiple data modalities, broadening the types and complexity of data our models can utilize.
We propose Multimodal Graph Learning (MMGL), a framework for capturing information from multiple multimodal neighbors with relational structures among them.
arXiv Detail & Related papers (2023-10-11T13:25:03Z) - A System for Morphology-Task Generalization via Unified Representation
and Behavior Distillation [28.041319351752485]
In this work, we explore a method for learning a single policy that manipulates various forms of agents to solve various tasks by distilling a large amount of proficient behavioral data.
We introduce morphology-task graph, which treats observations, actions and goals/task in a unified graph representation.
We also develop MxT-Bench for fast large-scale behavior generation, which supports procedural generation of diverse morphology-task combinations.
arXiv Detail & Related papers (2022-11-25T18:52:48Z) - Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge
Graph Completion [112.27103169303184]
Multimodal Knowledge Graphs (MKGs) organize visual-text factual knowledge.
MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER.
arXiv Detail & Related papers (2022-05-04T23:40:04Z) - GenURL: A General Framework for Unsupervised Representation Learning [58.59752389815001]
Unsupervised representation learning (URL) learns compact embeddings of high-dimensional data without supervision.
We propose a unified similarity-based URL framework, GenURL, which can smoothly adapt to various URL tasks.
Experiments demonstrate that GenURL achieves consistent state-of-the-art performance in self-supervised visual learning, unsupervised knowledge distillation (KD), graph embeddings (GE), and dimension reduction.
arXiv Detail & Related papers (2021-10-27T16:24:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.