Griffin: Towards a Graph-Centric Relational Database Foundation Model
- URL: http://arxiv.org/abs/2505.05568v2
- Date: Wed, 11 Jun 2025 17:37:10 GMT
- Title: Griffin: Towards a Graph-Centric Relational Database Foundation Model
- Authors: Yanbo Wang, Xiyuan Wang, Quan Gan, Minjie Wang, Qibin Yang, David Wipf, Muhan Zhang,
- Abstract summary: Griffin is the first foundation model attemptation designed specifically for Databases (RDBs)<n>We enhance the architecture by incorporating a cross-attention module and a novel aggregator.<n>Griffin is evaluated on large-scale, heterogeneous, and temporal graphs extracted from RDBs across various domains.
- Score: 37.09648739513178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce Griffin, the first foundation model attemptation designed specifically for Relational Databases (RDBs). Unlike previous smaller models focused on single RDB tasks, Griffin unifies the data encoder and task decoder to handle diverse tasks. Additionally, we enhance the architecture by incorporating a cross-attention module and a novel aggregator. Griffin utilizes pretraining on both single-table and RDB datasets, employing advanced encoders for categorical, numerical, and metadata features, along with innovative components such as cross-attention modules and enhanced message-passing neural networks (MPNNs) to capture the complexities of relational data. Evaluated on large-scale, heterogeneous, and temporal graphs extracted from RDBs across various domains (spanning over 150 million nodes), Griffin demonstrates superior or comparable performance to individually trained models, excels in low-data scenarios, and shows strong transferability with similarity and diversity in pretraining across new datasets and tasks, highlighting its potential as a universally applicable foundation model for RDBs. Code available at https://github.com/yanxwb/Griffin.
Related papers
- Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases [3.6423651166048874]
Flattening the database poses challenges for deep learning models.<n>We propose a novel hypergraph-based framework, that we call rel-HNN.<n>We show that rel-HNN significantly outperforms existing methods in both classification and regression tasks.
arXiv Detail & Related papers (2025-07-16T18:20:45Z) - Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures [50.46688111973999]
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data.<n>We present a new blueprint that enables end-to-end representation of'relational entity graphs' without traditional engineering feature.<n>We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data.
arXiv Detail & Related papers (2025-06-19T23:51:38Z) - RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases.<n>RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority tasks, with improvements of up to 25%.
arXiv Detail & Related papers (2025-02-10T18:58:40Z) - Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - RelBench: A Benchmark for Deep Learning on Relational Databases [78.52438155603781]
We present RelBench, a public benchmark for solving tasks over databases with graph neural networks.
We use RelBench to conduct the first comprehensive study of Deep Learning infrastructure.
RDL learns better whilst reducing human work needed by more than an order of magnitude.
arXiv Detail & Related papers (2024-07-29T14:46:13Z) - 4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs [67.47600679176963]
RDBs store vast amounts of rich, informative data spread across interconnected tables.
The progress of predictive machine learning models falls behind advances in other domains such as computer vision or natural language processing.
We explore a class of baseline models predicated on converting multi-table datasets into graphs.
We assemble a diverse collection of large-scale RDB datasets and (ii) coincident predictive tasks.
arXiv Detail & Related papers (2024-04-28T15:04:54Z) - Replica Tree-based Federated Learning using Limited Data [6.572149681197959]
In this work, we propose a novel federated learning framework, named RepTreeFL.
At the core of the solution is the concept of a replica, where we replicate each participating client by copying its model architecture and perturbing its local data distribution.
Our approach enables learning from limited data and a small number of clients by aggregating a larger number of models with diverse data distributions.
arXiv Detail & Related papers (2023-12-28T17:47:25Z) - SPARE: A Single-Pass Neural Model for Relational Databases [36.55513135391452]
We propose SPARE, a new class of neural models that can be trained efficiently on RDBs while providing similar accuracies as GNNs.
For enabling efficient training, different from GNNs, SPARE makes use of the fact that data in RDBs has a predictive regular structure, which allows one to train these models in a single pass while exploiting symmetries at the same time.
arXiv Detail & Related papers (2023-10-20T15:23:17Z) - Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models.
We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations.
By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z) - InPars: Data Augmentation for Information Retrieval using Large Language
Models [5.851846467503597]
In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for information retrieval tasks.
We show that models finetuned solely on our unsupervised dataset outperform strong baselines such as BM25.
retrievers finetuned on both supervised and our synthetic data achieve better zero-shot transfer than models finetuned only on supervised data.
arXiv Detail & Related papers (2022-02-10T16:52:45Z) - TabGNN: Multiplex Graph Neural Network for Tabular Data Prediction [43.35301059378836]
We propose a novel framework TabGNN based on recently popular graph neural networks (GNN)
Specifically, we firstly construct a multiplex graph to model the multifaceted sample relations, and then design a multiplex graph neural network to learn enhanced representation for each sample.
Experiments on eleven TDP datasets from various domains, including classification and regression ones, show that TabGNN can consistently improve the performance.
arXiv Detail & Related papers (2021-08-20T11:51:32Z) - Principal Neighbourhood Aggregation for Graph Nets [4.339839287869653]
Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data.
Recent work on their expressive power has focused on isomorphism tasks and countable feature spaces.
We extend this theoretical framework to include continuous features which occur regularly in real-world input domains.
arXiv Detail & Related papers (2020-04-12T23:30:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.