Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA
- URL: http://arxiv.org/abs/2602.22721v1
- Date: Thu, 26 Feb 2026 07:49:50 GMT
- Title: Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA
- Authors: Fengyu Li, Junhao Zhu, Kaishi Song, Lu Chen, Zhongming Yao, Tianyi Li, Christian S. Jensen,
- Abstract summary: Table Question Answering (TQA) aims to answer natural language questions over structured tables.<n>Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance.<n>We propose Operation-R1, the first framework that trains lightweight LLMs via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step.
- Score: 16.758340727602793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table Question Answering (TQA) aims to answer natural language questions over structured tables. Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance. However, these solutions rely on multiple LLM calls, resulting in prohibitive latencies and computational costs. We propose Operation-R1, the first framework that trains lightweight LLMs (e.g., Qwen-4B/1.7B) via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step. To train such an LLM, we first introduce a self-supervised rewarding mechanism to automatically obtain fine-grained pipeline-wise supervision signals for LLM training. We also propose variance-aware group resampling to mitigate training instability. To further enhance robustness of pipeline generation, we develop two complementary mechanisms: operation merge, which filters spurious operations through multi-candidate consensus, and adaptive rollback, which offers runtime protection against information loss in data transformation. Experiments on two benchmark datasets show that, with the same LLM backbone, Operation-R1 achieves average absolute accuracy gains of 9.55 and 6.08 percentage points over multi-step preparation baselines, with 79\% table compression and a 2.2$\times$ reduction in monetary cost.
Related papers
- RelayLLM: Efficient Reasoning via Collaborative Decoding [23.351598429979024]
RelayLLM is a novel framework for efficient reasoning via token-level collaborative decoding.<n>We show that RelayLLM achieves an average accuracy of 49.52%, effectively bridging the performance gap between the two models.
arXiv Detail & Related papers (2026-01-08T17:56:16Z) - MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z) - SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z) - RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing [27.481573948464987]
Radial is a novel framework for large language models routing.<n>It uses a lightweight Transformer-based backbone with a radial structure named RadialFormer to articulate the query-LLMs relationship.<n>It significantly outperforms existing routing methods by 9.2% and 5.8% in the Balance and Cost First scenarios.
arXiv Detail & Related papers (2025-06-04T12:16:41Z) - How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve [2.33361323991006]
Large Language Models (LLMs) are increasingly deployed for narrow tasks in resource-constrained settings.<n>We present LLM-Sieve, a framework that prunes LLMs down to the minimal parameter subset needed to preserve task performance.
arXiv Detail & Related papers (2025-05-23T20:17:20Z) - Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.<n>One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.<n>We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z) - LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE)<n>LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients.<n>It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z) - PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models.<n>We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z) - Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs [29.735465300269993]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they often struggle with spatial reasoning.<n>This paper presents a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities through iterative feedback between LLMs and Answer Set Programming (ASP)<n>We evaluate our approach on two benchmark datasets: StepGame and SparQA.
arXiv Detail & Related papers (2024-11-27T18:04:05Z) - Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation [33.05581803204543]
Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size.<n>We introduce SketchTune, a compressive adaptation strategy that compresses weights into compact fine-tunable sketches.<n>SketchTune is supported by mathematical insights into matrix classes that are better approximated using sketching rather than low-rank methods.
arXiv Detail & Related papers (2024-10-08T20:58:24Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [68.29746557968107]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.<n> Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models.
It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.