ReFuGe: Feature Generation for Prediction Tasks on Relational Databases with LLM Agents
- URL: http://arxiv.org/abs/2601.17735v1
- Date: Sun, 25 Jan 2026 08:02:29 GMT
- Title: ReFuGe: Feature Generation for Prediction Tasks on Relational Databases with LLM Agents
- Authors: Kyungho Kim, Geon Lee, Juyeon Kim, Dongwon Choi, Shinhwan Kang, Kijung Shin,
- Abstract summary: ReFuGe is an agentic framework for generating predictive features on RDBs.<n>It operates within an iterative feedback loop until performance converges.<n> Experiments on RDB benchmarks demonstrate that ReFuGe substantially improves performance on various RDB prediction tasks.
- Score: 33.930224200799366
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Relational databases (RDBs) play a crucial role in many real-world web applications, supporting data management across multiple interconnected tables. Beyond typical retrieval-oriented tasks, prediction tasks on RDBs have recently gained attention. In this work, we address this problem by generating informative relational features that enhance predictive performance. However, generating such features is challenging: it requires reasoning over complex schemas and exploring a combinatorially large feature space, all without explicit supervision. To address these challenges, we propose ReFuGe, an agentic framework that leverages specialized large language model agents: (1) a schema selection agent identifies the tables and columns relevant to the task, (2) a feature generation agent produces diverse candidate features from the selected schema, and (3) a feature filtering agent evaluates and retains promising features through reasoning-based and validation-based filtering. It operates within an iterative feedback loop until performance converges. Experiments on RDB benchmarks demonstrate that ReFuGe substantially improves performance on various RDB prediction tasks. Our code and datasets are available at https://github.com/K-Kyungho/REFUGE.
Related papers
- Relatron: Automating Relational Machine Learning over Relational Databases [50.94254514286021]
We present a study that unifies RDL and DFS in a shared design space and conducts architecture-centric searches across diverse RDB tasks.<n>Our analysis yields three key findings: (1) RDL does not consistently outperform DFS, with performance being highly task-dependent; (2) no single architecture dominates across tasks, underscoring the need for task-aware model selection; and accuracy is an unreliable guide for choice architecture.
arXiv Detail & Related papers (2026-02-26T02:45:22Z) - MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles [25.198556596878362]
We introduce three key strategies for parameter-efficient fine-tuning (PEFT) in agent tasks.<n>Inspired by the increasingly dominant Reason+Action paradigm, we first decompose the capabilities necessary for the agent tasks into three distinct roles.<n>We propose the Mixture-of-Roles (MoR) framework, which comprises three specialized Low-Rank Adaptation (LoRA) groups, each designated to fulfill a distinct role.
arXiv Detail & Related papers (2025-12-25T15:02:07Z) - MARS-SQL: A multi-agent reinforcement learning framework for Text-to-SQL [22.59453421744114]
We introduce MARS-, a novel multi-agent framework that combines principled task decomposition and interactive reinforcement learning (RL)<n>Experiments show that MARS- achieves state-of-the-art Execution Accuracy of 77.84% on the BIRD set and 89.84% on the Spider test set.
arXiv Detail & Related papers (2025-11-02T16:55:30Z) - Relational Database Distillation: From Structured Tables to Condensed Graph Data [48.347717300340435]
We aim to distill large-scale RDBs into compact heterogeneous graphs while retaining the power required for graph-based models.<n>We further design a kernel ridge regression-guided objective with pseudo-labels, which produces quality features for the distilled graph.
arXiv Detail & Related papers (2025-10-08T13:05:31Z) - Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z) - ImpRAG: Retrieval-Augmented Generation with Implicit Queries [34.72864597562907]
ImpRAG is a query-free RAG system that integrates retrieval and generation into a unified model.<n>We show that ImpRAG achieves 3.6-11.5 improvements in exact match scores on unseen tasks with diverse formats.
arXiv Detail & Related papers (2025-06-02T21:38:21Z) - TARGET: Benchmarking Table Retrieval for Generative Tasks [7.379012456053551]
TARGET is a benchmark for evaluating TAble Retrieval for GEnerative Tasks.<n>We analyze the retrieval performance of different retrievers in isolation, as well as their impact on downstream tasks.<n>We find that dense embedding-based retrievers far outperform a BM25 baseline which is less effective than it is for retrieval over unstructured text.
arXiv Detail & Related papers (2025-05-14T19:39:46Z) - MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration [63.31211701741323]
We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement.<n>We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing.<n>We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance.
arXiv Detail & Related papers (2025-03-19T14:46:53Z) - RelGNN: Composite Message Passing for Relational Deep Learning [56.48834369525997]
We introduce RelGNN, a novel GNN framework specifically designed to leverage the unique structural characteristics of the graphs built from relational databases.<n>RelGNN is evaluated on 30 diverse real-world tasks from Relbench (Fey et al., 2024), and achieves state-of-the-art performance on the vast majority tasks, with improvements of up to 25%.
arXiv Detail & Related papers (2025-02-10T18:58:40Z) - A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data [0.0]
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs)<n>Traditional RAG systems typically use a single-agent architecture to handle query generation, data retrieval, and response synthesis.<n>This paper proposes a multi-agent RAG system to address these limitations.
arXiv Detail & Related papers (2024-12-08T07:18:19Z) - BabelBench: An Omni Benchmark for Code-Driven Analysis of Multimodal and Multistructured Data [61.936320820180875]
Large language models (LLMs) have become increasingly pivotal across various domains.
BabelBench is an innovative benchmark framework that evaluates the proficiency of LLMs in managing multimodal multistructured data with code execution.
Our experimental findings on BabelBench indicate that even cutting-edge models like ChatGPT 4 exhibit substantial room for improvement.
arXiv Detail & Related papers (2024-10-01T15:11:24Z) - AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction [10.65417796726349]
relation extraction (RE) in complex scenarios faces challenges such as diverse relation types and ambiguous relations between entities within a single sentence.
We propose an agent-based RE framework, namely AgentRE, which fully leverages the potential of large language models to achieve RE in complex scenarios.
arXiv Detail & Related papers (2024-09-03T12:53:05Z) - CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
Cross-modal retrieval aims to search for instances, which are semantically related to the query through the interaction of different modal data.<n>Traditional solutions utilize a single-tower or dual-tower framework to explicitly compute the score between queries and candidates.<n>We propose a generative cross-modal retrieval framework (CART) based on coarse-to-fine semantic modeling.
arXiv Detail & Related papers (2024-06-25T12:47:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.