Related papers: TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios

URL: http://arxiv.org/abs/2403.19318v2
Date: Mon, 1 Apr 2024 05:10:56 GMT
Title: TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios
Authors: Xiaokang Zhang, Jing Zhang, Zeyao Ma, Yang Li, Bohan Zhang, Guanlin Li, Zijun Yao, Kangli Xu, Jinchang Zhou, Daniel Zhang-Li, Jifan Yu, Shu Zhao, Juanzi Li, Jie Tang,
Abstract summary: TableLLM is a robust large language model (LLM) with 13 billion parameters. TableLLM is purpose-built for proficiently handling data manipulation tasks. We have released the model checkpoint, source code, benchmarks, and a web application for user interaction.
Score: 52.73289223176475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce TableLLM, a robust large language model (LLM) with 13 billion parameters, purpose-built for proficiently handling tabular data manipulation tasks, whether they are embedded within documents or spreadsheets, catering to real-world office scenarios. We propose a distant supervision method for training, which comprises a reasoning process extension strategy, aiding in training LLMs to understand reasoning patterns more effectively as well as a cross-way validation strategy, ensuring the quality of the automatically generated data. To evaluate the performance of TableLLM, we have crafted a benchmark tailored to address both document and spreadsheet formats as well as constructed a well-organized evaluation pipeline capable of handling both scenarios. Thorough evaluations underscore the advantages of TableLLM when compared to various existing general-purpose and tabular data-focused LLMs. We have publicly released the model checkpoint, source code, benchmarks, and a web application for user interaction.Our codes and data are publicly available at https://github.com/TableLLM/TableLLM.

Related papers

Can LLM Annotations Replace User Clicks for Learning to Rank? [112.2254432364736]
Large-scale supervised data is essential for training modern ranking models, but obtaining high-quality human annotations is costly.<n>Click data has been widely used as a low-cost alternative, and with recent advances in large language models (LLMs), LLM-based relevance annotation has emerged as another promising annotation.<n> Experiments on both a public dataset, TianGong-ST, and an industrial dataset, Baidu-Click, show that click-supervised models perform better on high-frequency queries.<n>We explore two training strategies -- data scheduling and frequency-aware multi-objective learning -- that integrate both supervision signals.
arXiv Detail & Related papers (2025-11-10T02:26:14Z)
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs [24.312198733476063]
We present Tab-MIA, a benchmark dataset for evaluating MIAs on structured data in large language models.<n>We analyze the memorization behavior of pretrained LLMs on structured data derived from Wikipedia tables.
arXiv Detail & Related papers (2025-07-23T06:56:34Z)
Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction [80.88654868264645]
Arranged and Organized Extraction Benchmark designed to evaluate ability of large language models to comprehend fragmented documents.<n>AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries.<n>Results show that even the most advanced models struggled significantly.
arXiv Detail & Related papers (2025-07-22T06:37:51Z)
DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models [17.67098120469538]
DATE-LM is a benchmark for evaluating data attribution methods in language models.<n>It measures attribution quality through three key tasks -- training data selection, toxicity/bias filtering, and factual attribution.<n>Our findings show that no single method dominates across all tasks, data attribution methods have trade-offs with simpler baselines, and method performance is sensitive to task-specific evaluation design.
arXiv Detail & Related papers (2025-07-12T23:29:56Z)
Large Language Models for Spreadsheets: Benchmarking Progress and Evaluating Performance with FLARE [0.0]
Large Language Models (LLMs) have demonstrated some significant capabilities across various domains.<n>This study introduces a benchmark framework to evaluate the performance of leading LLMs in executing spreadsheet functions.
arXiv Detail & Related papers (2025-06-19T03:47:38Z)
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing [10.712756715779822]
Large Language Models (LLMs) have shown promise in data processing. These frameworks focus on reducing cost when executing user-specified operations. This is problematic for complex tasks and data. We present DocETL, a system that optimize complex document processing pipelines.
arXiv Detail & Related papers (2024-10-16T03:22:35Z)
TART: An Open-Source Tool-Augmented Framework for Explainable Table-based Reasoning [61.14586098005874]
Current Large Language Models (LLMs) exhibit limited ability to understand table structures and to apply precise numerical reasoning. We introduce our Tool-Augmented Reasoning framework for Tables (TART), which integrates LLMs with specialized tools. TART contains three key components: a table formatter to ensure accurate data representation, a tool maker to develop specific computational tools, and an explanation generator to maintain explainability.
arXiv Detail & Related papers (2024-09-18T06:19:59Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations [8.602181445598776]
We show how machine learning can be used to automate the annotation of large volumes of diverse tabular data. We release AnnotatedTables, a collection of 32,119 databases with LLM-generated annotations. We evaluate the performance of TabPFN, a recent neural classifier trained on Bayesian priors, on 2,720 tables with input-target columns identified by LLMs.
arXiv Detail & Related papers (2024-06-24T06:44:14Z)
SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [34.8332394229927]
SpreadsheetBench is designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel forums. Our comprehensive evaluation of various LLMs under both single-round and multi-round inference settings reveals a substantial gap between the state-of-the-art (SOTA) models and human performance.
arXiv Detail & Related papers (2024-06-21T09:06:45Z)
UniDM: A Unified Framework for Data Manipulation with Large Language Models [66.61466011795798]
Large Language Models (LLMs) resolve multiple data manipulation tasks. LLMs exhibit bright benefits in terms of performance but still require customized designs to fit each specific task. We propose UniDM, a unified framework which establishes a new paradigm to process data manipulation tasks.
arXiv Detail & Related papers (2024-05-10T14:44:04Z)
OpenTab: Advancing Large Language Models as Open-domain Table Reasoners [38.29047314758911]
OpenTab is an open-domain table reasoning framework powered by Large Language Models (LLMs) OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy.
arXiv Detail & Related papers (2024-02-22T08:01:01Z)
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation. Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results. For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data. For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z)
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning [52.257422715393574]
We introduce a self-guided methodology for Large Language Models (LLMs) to autonomously discern and select cherry samples from open-source datasets. Our key innovation, the Instruction-Following Difficulty (IFD) metric, emerges as a pivotal metric to identify discrepancies between a model's expected responses and its intrinsic generation capability.
arXiv Detail & Related papers (2023-08-23T09:45:29Z)
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study [44.39031420687302]
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. We try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs. We propose $textitself-augmentation$ for effective structural prompting, such as critical value / range identification.
arXiv Detail & Related papers (2023-05-22T14:23:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.