Related papers: Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA

URL: http://arxiv.org/abs/2602.22721v1
Date: Thu, 26 Feb 2026 07:49:50 GMT
Title: Replacing Multi-Step Assembly of Data Preparation Pipelines with One-Step LLM Pipeline Generation for Table QA
Authors: Fengyu Li, Junhao Zhu, Kaishi Song, Lu Chen, Zhongming Yao, Tianyi Li, Christian S. Jensen,
Abstract summary: Table Question Answering (TQA) aims to answer natural language questions over structured tables.<n>Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance.<n>We propose Operation-R1, the first framework that trains lightweight LLMs via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step.
Score: 16.758340727602793
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Table Question Answering (TQA) aims to answer natural language questions over structured tables. Large Language Models (LLMs) enable promising solutions to this problem, with operator-centric solutions that generate table manipulation pipelines in a multi-step manner offering state-of-the-art performance. However, these solutions rely on multiple LLM calls, resulting in prohibitive latencies and computational costs. We propose Operation-R1, the first framework that trains lightweight LLMs (e.g., Qwen-4B/1.7B) via a novel variant of reinforcement learning with verifiable rewards to produce high-quality data-preparation pipelines for TQA in a single inference step. To train such an LLM, we first introduce a self-supervised rewarding mechanism to automatically obtain fine-grained pipeline-wise supervision signals for LLM training. We also propose variance-aware group resampling to mitigate training instability. To further enhance robustness of pipeline generation, we develop two complementary mechanisms: operation merge, which filters spurious operations through multi-candidate consensus, and adaptive rollback, which offers runtime protection against information loss in data transformation. Experiments on two benchmark datasets show that, with the same LLM backbone, Operation-R1 achieves average absolute accuracy gains of 9.55 and 6.08 percentage points over multi-step preparation baselines, with 79\% table compression and a 2.2$\times$ reduction in monetary cost.

Related papers

RelayLLM: Efficient Reasoning via Collaborative Decoding [23.351598429979024]
RelayLLM is a novel framework for efficient reasoning via token-level collaborative decoding.<n>We show that RelayLLM achieves an average accuracy of 49.52%, effectively bridging the performance gap between the two models.
arXiv Detail & Related papers (2026-01-08T17:56:16Z)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling [58.05959902776133]
We introduce Single-Pass.<n>with Reference-Guided Evaluation (SPARE), a novel structured framework that enables efficient per-step annotation.<n>We demonstrate SPARE's effectiveness across four diverse datasets spanning mathematical reasoning (GSM8K, MATH), multi-hop question answering (MuSiQue-Ans), and spatial reasoning (SpaRP)<n>On ProcessBench, SPARE demonstrates data-efficient out-of-distribution generalization, using only $sim$16% of training samples compared to human-labeled and other synthetically trained baselines.
arXiv Detail & Related papers (2025-06-18T14:37:59Z)
RadialRouter: Structured Representation for Efficient and Robust Large Language Models Routing [27.481573948464987]
Radial is a novel framework for large language models routing.<n>It uses a lightweight Transformer-based backbone with a radial structure named RadialFormer to articulate the query-LLMs relationship.<n>It significantly outperforms existing routing methods by 9.2% and 5.8% in the Balance and Cost First scenarios.
arXiv Detail & Related papers (2025-06-04T12:16:41Z)
How Many Parameters Does Your Task Really Need? Task Specific Pruning with LLM-Sieve [2.33361323991006]
Large Language Models (LLMs) are increasingly deployed for narrow tasks in resource-constrained settings.<n>We present LLM-Sieve, a framework that prunes LLMs down to the minimal parameter subset needed to preserve task performance.
arXiv Detail & Related papers (2025-05-23T20:17:20Z)
Confident or Seek Stronger: Exploring Uncertainty-Based On-device LLM Routing From Benchmarking to Generalization [61.02719787737867]
Large language models (LLMs) are increasingly deployed and democratized on edge devices.<n>One promising solution is uncertainty-based SLM routing, offloading high-stakes queries to stronger LLMs when resulting in low-confidence responses on SLM.<n>We conduct a comprehensive investigation into benchmarking and generalization of uncertainty-driven routing strategies from SLMs to LLMs over 1500+ settings.
arXiv Detail & Related papers (2025-02-06T18:59:11Z)
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE)<n>LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients.<n>It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z)
PickLLM: Context-Aware RL-Assisted Large Language Model Routing [0.5325390073522079]
PickLLM is a lightweight framework that relies on Reinforcement Learning (RL) to route on-the-fly queries to available models.<n>We demonstrate the speed of convergence for different learning rates and improvement in hard metrics such as cost per querying session and overall response latency.
arXiv Detail & Related papers (2024-12-12T06:27:12Z)
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs [29.735465300269993]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, yet they often struggle with spatial reasoning.<n>This paper presents a novel neural-symbolic framework that enhances LLMs' spatial reasoning abilities through iterative feedback between LLMs and Answer Set Programming (ASP)<n>We evaluate our approach on two benchmark datasets: StepGame and SparQA.
arXiv Detail & Related papers (2024-11-27T18:04:05Z)
Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation [33.05581803204543]
Adapting pre-trained large language models (LLMs) is crucial but challenging due to their enormous size.<n>We introduce SketchTune, a compressive adaptation strategy that compresses weights into compact fine-tunable sketches.<n>SketchTune is supported by mathematical insights into matrix classes that are better approximated using sketching rather than low-rank methods.
arXiv Detail & Related papers (2024-10-08T20:58:24Z)
Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [68.29746557968107]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.<n> Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z)
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs [53.31402059062365]
BiLLM is a groundbreaking 1-bit post-training quantization scheme tailored for pretrained large language models. It achieves for the first time high-accuracy inference (e.g. 8.41 perplexity on LLaMA2-70B) with only 1.08-bit weights across various LLMs families.
arXiv Detail & Related papers (2024-02-06T09:26:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.