Relational In-Context Learning via Synthetic Pre-training with Structural Prior
- URL: http://arxiv.org/abs/2603.03805v1
- Date: Wed, 04 Mar 2026 07:30:54 GMT
- Title: Relational In-Context Learning via Synthetic Pre-training with Structural Prior
- Authors: Yanbo Wang, Jiaxuan You, Chuan Shi, Muhan Zhang,
- Abstract summary: RDB-PFN is the first relational foundation model trained purely via $textbfsynthetic$.<n>Inspired by Prior-Data Fitted Networks (PFNs) where synthetic data generated from Structural Causal Models (SCMs) enables reasoning on single tables.<n>Experiments verify RDB-PFN achieves strong few-shot performance on 19 real-world prediction tasks.
- Score: 60.404256960057545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relational Databases (RDBs) are the backbone of modern business, yet they lack foundation models comparable to those in text or vision. A key obstacle is that high-quality RDBs are private, scarce and structurally heterogeneous, making internet-scale pre-training infeasible. To overcome this data scarcity, We introduce $\textbf{RDB-PFN}$, the first relational foundation model trained purely via $\textbf{synthetic data}$. Inspired by Prior-Data Fitted Networks (PFNs) where synthetic data generated from Structural Causal Models (SCMs) enables reasoning on single tables, we design a $\textbf{Relational Prior Generator}$ to create an infinite stream of diverse RDBs from scratch. Pre-training on $\textbf{over 2 million}$ synthetic single-table and relational tasks, RDB-PFN learns to adapt to any new database instantly via genuine $\textbf{in-context learning}$. Experiments verify RDB-PFN achieves strong few-shot performance on 19 real-world relational prediction tasks, outperforming graph-based and single-table foundation-model baselines (given the same DFS-linearized inputs), while using a lightweight architecture and fast inference. The code is available at https://github.com/MuLabPKU/RDBPFN
Related papers
- Relatron: Automating Relational Machine Learning over Relational Databases [50.94254514286021]
We present a study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks.<n>Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and accuracy is an unreliable guide for choice architecture.
arXiv Detail & Related papers (2026-02-26T02:45:22Z) - No Need to Train Your RDB Foundation Model [21.996337463952255]
We present a family of RDB encoders that can be seamlessly paired with already-existing single-table ICL foundation models.<n>From a practical standpoint, we develop scalable SQL primitives to implement the encoder stage, resulting in an easy-to-use open-source RDB foundation model.
arXiv Detail & Related papers (2026-02-14T09:38:57Z) - PluRel: Synthetic Data unlocks Scaling Laws for Relational Foundation Models [51.42043158297229]
We introduce Pluel, a framework to synthesize multi-tabular relational databases from scratch.<n>In a step-by-step fashion, Pluel models (1) schemas with directed graphs, (2) inter-table primary-foreign key connectivity with bipartite graphs, and, (3) feature distributions in tables via conditional causal mechanisms.
arXiv Detail & Related papers (2026-02-03T21:35:18Z) - Generalization Can Emerge in Tabular Foundation Models From a Single Table [38.07740881271672]
We show that simple self-supervised pre-training on just a emphsingle real table can produce surprisingly strong transfer across heterogeneous benchmarks.<n>We then connect this to the pre-training procedure shared by most TFMs and show that the number and quality of emphtasks one can construct from a dataset is key to downstream performance.
arXiv Detail & Related papers (2025-11-12T19:12:40Z) - RelDiff: Relational Data Generative Modeling with Graph-Based Diffusion Models [83.6013616017646]
RelDiff is a novel diffusion generative model that synthesizes complete relational databases by explicitly modeling their foreign key graph structure.<n>RelDiff consistently outperforms prior methods in producing realistic and coherent synthetic relational databases.
arXiv Detail & Related papers (2025-05-31T21:01:02Z) - Joint Relational Database Generation via Graph-Conditional Diffusion Models [44.06390394789874]
Building generative models for databases (RDBs) is important for applications like privacy's data release and real datasets.<n>Most prior either focuses on single-table generation or relies on autoregressive factorizations that impose a fixed table order and generate tables sequentially.<n>We propose a fundamentally different approach: jointly modeling all tables in an RDB without imposing any order.
arXiv Detail & Related papers (2025-05-22T11:12:56Z) - RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases.<n>RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority tasks, with improvements of up to 25%.
arXiv Detail & Related papers (2025-02-10T18:58:40Z) - RelBench: A Benchmark for Deep Learning on Relational Databases [78.52438155603781]
We present RelBench, a public benchmark for solving tasks over databases with graph neural networks.
We use RelBench to conduct the first comprehensive study of Deep Learning infrastructure.
RDL learns better whilst reducing human work needed by more than an order of magnitude.
arXiv Detail & Related papers (2024-07-29T14:46:13Z) - GFS: Graph-based Feature Synthesis for Prediction over Relational
Databases [39.975491511390985]
We propose a novel framework called Graph-based Feature Synthesis (GFS)
GFS formulates relational database as a heterogeneous graph database.
In an experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases.
arXiv Detail & Related papers (2023-12-04T16:54:40Z) - SPARE: A Single-Pass Neural Model for Relational Databases [36.55513135391452]
We propose SPARE, a new class of neural models that can be trained efficiently on RDBs while providing similar accuracies as GNNs.
For enabling efficient training, different from GNNs, SPARE makes use of the fact that data in RDBs has a predictive regular structure, which allows one to train these models in a single pass while exploiting symmetries at the same time.
arXiv Detail & Related papers (2023-10-20T15:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.