Related papers: RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

URL: http://arxiv.org/abs/2506.01360v1
Date: Mon, 02 Jun 2025 06:34:10 GMT
Title: RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases
Authors: Dongwon Choi, Sunwoo Kim, Juyeon Kim, Kyungho Kim, Geon Lee, Shinhwan Kang, Myunghwan Kim, Kijung Shin,
Abstract summary: RDB-to-graph modeling helps capture cross-table dependencies, leading to enhanced performance across diverse tasks.<n>Applying a common rule for graph modeling leads to a 10% drop in performance compared to the best-performing graph model.<n>We introduce RDB2G, the first benchmark framework for evaluating such methods.
Score: 23.836665904554426
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Relational databases (RDBs) are composed of interconnected tables, where relationships between them are defined through foreign keys. Recent research on applying machine learning to RDBs has explored graph-based representations of RDBs, where rows of tables are modeled as nodes, and foreign key relationships are modeled as edges. RDB-to-graph modeling helps capture cross-table dependencies, ultimately leading to enhanced performance across diverse tasks. However, there are numerous ways to model RDBs as graphs, and performance varies significantly depending on the chosen graph model. In our analysis, applying a common heuristic rule for graph modeling leads to up to a 10% drop in performance compared to the best-performing graph model, which remains non-trivial to identify. To foster research on intelligent RDB-to-graph modeling, we introduce RDB2G-Bench, the first benchmark framework for evaluating such methods. We construct extensive datasets covering 5 real-world RDBs and 12 predictive tasks, resulting in around 50k graph-performance pairs for efficient and reproducible evaluations. Thanks to our precomputed datasets, we were able to benchmark 9 automatic RDB-to-graph modeling methods on the 12 tasks over 600x faster than on-the-fly evaluation, which requires repeated model training. Our analysis of the datasets and benchmark results reveals key structural patterns affecting graph model effectiveness, along with practical implications for effective graph modeling.

Related papers

Beyond Model Base Selection: Weaving Knowledge to Master Fine-grained Neural Network Design [20.31388126105889]
We propose M-DESIGN, a curated model knowledge base (MKB) pipeline for mastering neural network refinement.<n>First, we propose a knowledge weaving engine that reframes model refinement as an adaptive query problem over task metadata.<n>Given a user's task query, M-DESIGN quickly matches and iteratively refines candidate models by leveraging a graph-relational knowledge schema.
arXiv Detail & Related papers (2025-07-21T07:49:19Z)
Rel-HNN: Split Parallel Hypergraph Neural Network for Learning on Relational Databases [3.6423651166048874]
Flattening the database poses challenges for deep learning models.<n>We propose a novel hypergraph-based framework, that we call rel-HNN.<n>We show that rel-HNN significantly outperforms existing methods in both classification and regression tasks.
arXiv Detail & Related papers (2025-07-16T18:20:45Z)
REDELEX: A Framework for Relational Deep Learning Exploration [0.0]
Recently, Deep Deep Learning has emerged as a novel paradigm wherein RDBs are conceptualized as graph structures.<n>There is a lack of analysis into the relationships between various RDL models and the characteristics of the underlying RDBs.<n>We present REDELEX$-$a comprehensive exploration framework for evaluating RDL models of varying complexity on the most diverse collection of over 70 RDBs.
arXiv Detail & Related papers (2025-06-27T13:05:15Z)
Relational Deep Learning: Challenges, Foundations and Next-Generation Architectures [50.46688111973999]
Graph machine learning has led to a significant increase in the capabilities of models that learn on arbitrary graph-structured data.<n>We present a new blueprint that enables end-to-end representation of'relational entity graphs' without traditional engineering feature.<n>We discuss key challenges including large-scale multi-table integration and the complexities of modeling temporal dynamics and heterogeneous data.
arXiv Detail & Related papers (2025-06-19T23:51:38Z)
Joint Relational Database Generation via Graph-Conditional Diffusion Models [44.06390394789874]
Building generative models for databases (RDBs) is important for applications like privacy's data release and real datasets.<n>Most prior either focuses on single-table generation or relies on autoregressive factorizations that impose a fixed table order and generate tables sequentially.<n>We propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any order.
arXiv Detail & Related papers (2025-05-22T11:12:56Z)
RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases.<n>RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority tasks, with improvements of up to 25%.
arXiv Detail & Related papers (2025-02-10T18:58:40Z)
Time-Varying Graph Learning for Data with Heavy-Tailed Distribution [15.576923158246428]
Graph models provide efficient tools to capture the underlying structure of data defined over networks.<n>Current methodology for learning such models often lacks robustness to outliers in the data.<n>This paper addresses the problem of learning time-varying graph models capable of efficiently representing heavy-tailed data.
arXiv Detail & Related papers (2024-12-31T19:09:57Z)
Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations. Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations. We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z)
4DBInfer: A 4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on Relational DBs [67.47600679176963]
RDBs store vast amounts of rich, informative data spread across interconnected tables. The progress of predictive machine learning models falls behind advances in other domains such as computer vision or natural language processing. We explore a class of baseline models predicated on converting multi-table datasets into graphs. We assemble a diverse collection of large-scale RDB datasets and (ii) coincident predictive tasks.
arXiv Detail & Related papers (2024-04-28T15:04:54Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
AutoRC: Improving BERT Based Relation Classification Models via Architecture Search [50.349407334562045]
BERT based relation classification (RC) models have achieved significant improvements over the traditional deep learning models. No consensus can be reached on what is the optimal architecture. We design a comprehensive search space for BERT based RC models and employ neural architecture search (NAS) method to automatically discover the design choices.
arXiv Detail & Related papers (2020-09-22T16:55:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.