Investigating Table-to-Text Generation Capabilities of LLMs in
Real-World Information Seeking Scenarios
- URL: http://arxiv.org/abs/2305.14987v2
- Date: Mon, 30 Oct 2023 22:00:25 GMT
- Title: Investigating Table-to-Text Generation Capabilities of LLMs in
Real-World Information Seeking Scenarios
- Authors: Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang,
Arman Cohan
- Abstract summary: Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes.
The adoption of large language models (LLMs) in real-world applications for table information seeking remains underexplored.
This paper investigates the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios.
- Score: 32.84523661055774
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tabular data is prevalent across various industries, necessitating
significant time and effort for users to understand and manipulate for their
information-seeking purposes. The advancements in large language models (LLMs)
have shown enormous potential to improve user efficiency. However, the adoption
of LLMs in real-world applications for table information seeking remains
underexplored. In this paper, we investigate the table-to-text capabilities of
different LLMs using four datasets within two real-world information seeking
scenarios. These include the LogicNLG and our newly-constructed LoTNLG datasets
for data insight generation, along with the FeTaQA and our newly-constructed
F2WTQ datasets for query-based generation. We structure our investigation
around three research questions, evaluating the performance of LLMs in
table-to-text generation, automated evaluation, and feedback generation,
respectively. Experimental results indicate that the current high-performing
LLM, specifically GPT-4, can effectively serve as a table-to-text generator,
evaluator, and feedback generator, facilitating users' information seeking
purposes in real-world scenarios. However, a significant performance gap still
exists between other open-sourced LLMs (e.g., Tulu and LLaMA-2) and GPT-4
models. Our data and code are publicly available at
https://github.com/yale-nlp/LLM-T2T.
Related papers
- Enhancing Discriminative Tasks by Guiding the Pre-trained Language Model with Large Language Model's Experience [4.814313782484443]
Large Language Models (LLMs) and pre-trained Language Models (LMs) have achieved impressive success on many software engineering tasks.
We use LLMs to generate domain-specific data, thereby improving the performance of pre-trained LMs on the target tasks.
arXiv Detail & Related papers (2024-08-16T06:37:59Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - DCA-Bench: A Benchmark for Dataset Curation Agents [9.60250892491588]
We propose a dataset curation agent benchmark, DCA-Bench, to measure large language models' capability of detecting hidden dataset quality issues.
Specifically, we collect diverse real-world dataset quality issues from eight open dataset platforms as a testbed.
The proposed benchmark can also serve as a testbed for measuring the capability of LLMs in problem discovery rather than just problem-solving.
arXiv Detail & Related papers (2024-06-11T14:02:23Z) - Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes [57.62036621319563]
We introduce CLLM, which leverages the prior knowledge of Large Language Models (LLMs) for data augmentation in the low-data regime.
We demonstrate the superior performance of CLLM in the low-data regime compared to conventional generators.
arXiv Detail & Related papers (2023-12-19T12:34:46Z) - Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator [63.762209407570715]
Genixer is a comprehensive data generation pipeline consisting of four key steps.
A synthetic VQA-like dataset trained with LLaVA1.5 enhances performance on 10 out of 12 multimodal benchmarks.
MLLMs trained with task-specific datasets can surpass GPT-4V in generating complex instruction tuning data.
arXiv Detail & Related papers (2023-12-11T09:44:41Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z) - Large Language Models as Data Preprocessors [9.99065004972981]
Large Language Models (LLMs) have marked a significant advancement in artificial intelligence.
This study explores their potential in data preprocessing, a critical stage in data mining and analytics applications.
We propose an LLM-based framework for data preprocessing, which integrates cutting-edge prompt engineering techniques.
arXiv Detail & Related papers (2023-08-30T23:28:43Z) - MLLM-DataEngine: An Iterative Refinement Approach for MLLM [62.30753425449056]
We propose a novel closed-loop system that bridges data generation, model training, and evaluation.
Within each loop, the MLLM-DataEngine first analyze the weakness of the model based on the evaluation results.
For targeting, we propose an Adaptive Bad-case Sampling module, which adjusts the ratio of different types of data.
For quality, we resort to GPT-4 to generate high-quality data with each given data type.
arXiv Detail & Related papers (2023-08-25T01:41:04Z) - LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language
Models [13.659853119356507]
Large Language Models (LLMs) have revolutionized natural language processing and demonstrated impressive capabilities in various tasks.
They are prone to hallucinations, where the model exposes incorrect or false information in its responses.
We propose LLMMaps as a novel visualization technique that enables users to evaluate LLMs' performance with respect to Q&A datasets.
arXiv Detail & Related papers (2023-04-02T05:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.