LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources
- URL: http://arxiv.org/abs/2510.18477v2
- Date: Thu, 30 Oct 2025 04:49:08 GMT
- Title: LAFA: Agentic LLM-Driven Federated Analytics over Decentralized Data Sources
- Authors: Haichao Ji, Zibo Wang, Cheng Pan, Meng Han, Yifei Zhu, Dan Wang, Zhu Han,
- Abstract summary: Large Language Models (LLMs) have shown great promise in automating data analytics tasks by interpreting natural language queries.<n>Existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection.<n>We present LAFA, the first system that integrates LLM-agent-based data analytics with federated analytics.
- Score: 35.235993431071286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have shown great promise in automating data analytics tasks by interpreting natural language queries and generating multi-operation execution plans. However, existing LLM-agent-based analytics frameworks operate under the assumption of centralized data access, offering little to no privacy protection. In contrast, federated analytics (FA) enables privacy-preserving computation across distributed data sources, but lacks support for natural language input and requires structured, machine-readable queries. In this work, we present LAFA, the first system that integrates LLM-agent-based data analytics with FA. LAFA introduces a hierarchical multi-agent architecture that accepts natural language queries and transforms them into optimized, executable FA workflows. A coarse-grained planner first decomposes complex queries into sub-queries, while a fine-grained planner maps each subquery into a Directed Acyclic Graph of FA operations using prior structural knowledge. To improve execution efficiency, an optimizer agent rewrites and merges multiple DAGs, eliminating redundant operations and minimizing computational and communicational overhead. Our experiments demonstrate that LAFA consistently outperforms baseline prompting strategies by achieving higher execution plan success rates and reducing resource-intensive FA operations by a substantial margin. This work establishes a practical foundation for privacy-preserving, LLM-driven analytics that supports natural language input in the FA setting.
Related papers
- Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs [66.63911043019294]
Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them.<n>This paper focuses on the use of LLM techniques to prepare data for diverse downstream tasks.<n>We introduce a task-centric taxonomy that organizes the field into three major tasks: data cleaning, standardization, error processing, imputation, data integration, and data enrichment.
arXiv Detail & Related papers (2026-01-22T12:02:45Z) - Enhancing Foundation Models in Transaction Understanding with LLM-based Sentence Embeddings [26.118375969968437]
Large Language Models (LLMs) can address this limitation through superior semantic understanding.<n>We introduce a hybrid framework that uses LLM-generated embeddings as semantic initializations for lightweight transaction models.<n>Our approach employs multi-source data fusion to enrich merchant categorical fields and a one-word constraint principle for consistent embedding generation.
arXiv Detail & Related papers (2025-12-01T23:30:17Z) - Bayesian Network Structure Discovery Using Large Language Models [27.478536621589345]
We propose a unified framework for Bayesian network structure discovery.<n>Our framework places large language models (LLMs) at the center, supporting both data-free and data-aware settings.<n> Experiments demonstrate that our method significantly outperforms both existing LLM-based approaches and traditional data-driven algorithms.
arXiv Detail & Related papers (2025-11-01T14:32:52Z) - LLM/Agent-as-Data-Analyst: A Survey [54.01326293336748]
Large language model (LLM) and agent techniques for data analysis have demonstrated substantial impact in both academica and industry.<n>The technical evolution further distills five key design goals for intelligent data analysis agents, namely semantic-aware design, hybrid integration, autonomous pipelines, tool-augmented modality, and support for open-world tasks.
arXiv Detail & Related papers (2025-09-28T17:31:38Z) - FuDoBa: Fusing Document and Knowledge Graph-based Representations with Bayesian Optimisation [43.56253799373878]
We introduce FuDoBa, a Bayesian optimisation-based method that integrates LLM-based embeddings with domain-specific structured knowledge.<n>This fusion produces low-dimensional, task-relevant representations while reducing training complexity and yielding interpretable early-fusion weights.<n>We demonstrate the effectiveness of our approach on six datasets in two domains, showing that our proposed representation learning approach performs on par with, or surpasses, those produced solely by the proprietary LLM-based embedding baselines.
arXiv Detail & Related papers (2025-07-09T07:49:55Z) - Intelligent Assistants for the Semiconductor Failure Analysis with LLM-Based Planning Agents [1.2693545159861859]
Failure Analysis (FA) is a highly intricate and knowledge-intensive process.<n>The integration of AI components within the computational infrastructure of FA labs has the potential to automate a variety of tasks.<n>This paper investigates the design and implementation of an agentic AI system for semiconductor FA using a Large Language Model (LLM)-based Planning Agent (LPA)
arXiv Detail & Related papers (2025-06-18T15:43:10Z) - Large Language Models are Good Relational Learners [55.40941576497973]
We introduce Rel-LLM, a novel architecture that utilizes a graph neural network (GNN)- based encoder to generate structured relational prompts for large language models (LLMs)<n>Unlike traditional text-based serialization approaches, our method preserves the inherent relational structure of databases while enabling LLMs to process and reason over complex entity relationships.
arXiv Detail & Related papers (2025-06-06T04:07:55Z) - Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z) - IDA-Bench: Evaluating LLMs on Interactive Guided Data Analysis [60.32962597618861]
IDA-Bench is a novel benchmark evaluating large language models in multi-round interactive scenarios.<n>Agent performance is judged by comparing its final numerical output to the human-derived baseline.<n>Even state-of-the-art coding agents (like Claude-3.7-thinking) succeed on 50% of the tasks, highlighting limitations not evident in single-turn tests.
arXiv Detail & Related papers (2025-05-23T09:37:52Z) - System Log Parsing with Large Language Models: A Review [2.2779174914142346]
Large language models (LLMs) have introduced the new research field of LLM-based log parsing.<n>Despite promising results, there is no structured overview of the approaches in this relatively new research field.<n>This work systematically reviews 29 LLM-based log parsing methods.
arXiv Detail & Related papers (2025-04-07T09:41:04Z) - Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB [44.057784044659726]
Large language models (LLMs) have made it easier to prototype such retrieval and reasoning data pipelines.<n>This often involves orchestrating data systems, managing data movement, and handling low-level details.<n>We introduce FlockMTL: an extension for abstractions that integrates deeply LLM capabilities and retrieval-augmented generation.
arXiv Detail & Related papers (2025-04-01T19:48:17Z) - DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing [10.712756715779822]
Large Language Models (LLMs) have shown promise in data processing.<n>These frameworks focus on reducing cost when executing user-specified operations.<n>This is problematic for complex tasks and data.<n>We present DocETL, a system that optimize complex document processing pipelines.
arXiv Detail & Related papers (2024-10-16T03:22:35Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.