Related papers: Large Language Models are few(1)-shot Table Reasoners

Large Language Models are few(1)-shot Table Reasoners

URL: http://arxiv.org/abs/2210.06710v1
Date: Thu, 13 Oct 2022 04:08:24 GMT
Title: Large Language Models are few(1)-shot Table Reasoners
Authors: Wenhu Chen
Abstract summary: Large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks. In this paper, we aim at understanding how well LLMs can perform on table tasks with few-shot in-context learning.
Score: 31.036914270008978
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent literature has shown that large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks. However, the capability of LLMs on table reasoning tasks is yet to be explored. In this paper, we aim at understanding how well LLMs can perform on these table tasks with few-shot in-context learning. Specifically, we evaluate LLMs on popular table QA and fact verification datasets like WikiTableQuestion, FetaQA, TabFact, and FEVEROUS and found that LLMs are really competent at complex reasoning over table structures. When combined with `chain of thoughts' prompting, GPT-3 is able to achieve very strong performance with only a 1-shot demonstration. We further manually study the reasoning chains elicited from LLMs and found that these reasoning chains are highly consistent with the `ground truth' semantic form. We believe that our study opens new possibilities to employ LLMs on different table-based reasoning tasks under few-shot scenario.

Related papers

How well do LLMs reason over tabular data, really? [2.5015086558362247]
Large Language Models (LLMs) excel in natural language tasks, but less is known about their reasoning capabilities over tabular data.<n>We show that an LLM-as-a-judge procedure yields more reliable performance insights.<n>We then extend the tabular inputs reflecting three common characteristics in practice: missing values, duplicate entities, and structural variations.
arXiv Detail & Related papers (2025-05-12T11:35:28Z)
CARL-GT: Evaluating Causal Reasoning Capabilities of Large Language Models [18.975064947089805]
Causal reasoning capabilities are essential for large language models (LLMs) in a wide range of applications, such as education and healthcare. We provide a benchmark, named by CARL-GT, which evaluates CAusal Reasoning capabilities of large Language models using Graphs and Tabular data.
arXiv Detail & Related papers (2024-12-23T20:34:32Z)
Logical Consistency of Large Language Models in Fact-checking [6.286017217366497]
Large language models (LLMs) have demonstrated significant success in performing varied natural language tasks. Despite their impressive ability to generate human-like texts, LLMs are infamous for their inconsistent responses.
arXiv Detail & Related papers (2024-12-20T17:42:25Z)
PTD-SQL: Partitioning and Targeted Drilling with LLMs in Text-to-SQL [54.304872649870575]
Large Language Models (LLMs) have emerged as powerful tools for Text-to-sense tasks. In this study, we propose that employing query group partitioning allows LLMs to focus on learning the thought processes specific to a single problem type.
arXiv Detail & Related papers (2024-09-21T09:33:14Z)
Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path [53.71787069694794]
We focus on the graph reasoning ability of Large Language Models (LLMs) We revisit the ability of LLMs on three fundamental graph tasks: graph description translation, graph connectivity, and the shortest-path problem. Our findings suggest that LLMs can fail to understand graph structures through text descriptions and exhibit varying performance for all these fundamental tasks.
arXiv Detail & Related papers (2024-08-18T16:26:39Z)
Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task. We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z)
Do Large Language Models Understand Logic or Just Mimick Context? [14.081178100662163]
This paper investigates the reasoning capabilities of large language models (LLMs) on two logical reasoning datasets. It is found that LLMs do not truly understand logical rules; rather, in-context learning has simply enhanced the likelihood of these models arriving at the correct answers.
arXiv Detail & Related papers (2024-02-19T12:12:35Z)
Evaluating LLMs' Mathematical Reasoning in Financial Document Question Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets. We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps. We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z)
A Survey of Table Reasoning with Large Language Models [55.2326738851157]
Using Large Language Models (LLMs) has become the mainstream method for table reasoning. We analyze the mainstream techniques used to improve table reasoning performance in the LLM era. We provide research directions from both the improvement of existing methods and the expansion of practical applications.
arXiv Detail & Related papers (2024-02-13T07:17:52Z)
TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [55.33939289989238]
We propose TAP4LLM as a versatile pre-processor suite for leveraging large language models (LLMs) in table-based tasks effectively. It covers several distinct components: (1) table sampling to decompose large tables into manageable sub-tables based on query semantics, (2) table augmentation to enhance tables with additional knowledge from external sources or models, and (3) table packing & serialization to convert tables into various formats suitable for LLMs' understanding.
arXiv Detail & Related papers (2023-12-14T15:37:04Z)
Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study [44.39031420687302]
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. We try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs. We propose $textitself-augmentation$ for effective structural prompting, such as critical value / range identification.
arXiv Detail & Related papers (2023-05-22T14:23:46Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs) We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
Large Language Models are Zero-Shot Reasoners [28.6899375595088]
Chain of thought (CoT) prompting is a technique for eliciting complex multi-step reasoning through step-by-step answer examples. We show that LLMs are decent zero-shot reasoners by simply adding Let's think step by step'' before each answer. Experimental results demonstrate that our Zero-shot-CoT, using the same single prompt template, significantly outperforms zero-shot LLM performances.
arXiv Detail & Related papers (2022-05-24T09:22:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.