Large Language Models are few(1)-shot Table Reasoners
- URL: http://arxiv.org/abs/2210.06710v1
- Date: Thu, 13 Oct 2022 04:08:24 GMT
- Title: Large Language Models are few(1)-shot Table Reasoners
- Authors: Wenhu Chen
- Abstract summary: Large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks.
In this paper, we aim at understanding how well LLMs can perform on table tasks with few-shot in-context learning.
- Score: 31.036914270008978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent literature has shown that large language models (LLMs) are generally
excellent few-shot reasoners to solve text reasoning tasks. However, the
capability of LLMs on table reasoning tasks is yet to be explored. In this
paper, we aim at understanding how well LLMs can perform on these table tasks
with few-shot in-context learning. Specifically, we evaluate LLMs on popular
table QA and fact verification datasets like WikiTableQuestion, FetaQA,
TabFact, and FEVEROUS and found that LLMs are really competent at complex
reasoning over table structures. When combined with `chain of thoughts'
prompting, GPT-3 is able to achieve very strong performance with only a 1-shot
demonstration. We further manually study the reasoning chains elicited from
LLMs and found that these reasoning chains are highly consistent with the
`ground truth' semantic form. We believe that our study opens new possibilities
to employ LLMs on different table-based reasoning tasks under few-shot
scenario.
Related papers
- Benchmarking LLMs on the Semantic Overlap Summarization Task [9.656095701778975]
This paper comprehensively evaluates Large Language Models (LLMs) on the Semantic Overlap Summarization (SOS) task.
We report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives.
arXiv Detail & Related papers (2024-02-26T20:33:50Z) - Evaluating LLMs' Mathematical Reasoning in Financial Document Question
Answering [53.56653281752486]
This study explores Large Language Models' mathematical reasoning on four financial question-answering datasets.
We focus on sensitivity to table complexity and performance variations with an increasing number of arithmetic reasoning steps.
We introduce a novel prompting technique tailored to semi-structured documents, matching or outperforming other baselines in performance.
arXiv Detail & Related papers (2024-02-17T05:10:18Z) - A Survey of Table Reasoning with Large Language Models [55.2326738851157]
Using Large Language Models (LLMs) has become the mainstream method for table reasoning.
We analyze the mainstream techniques used to improve table reasoning performance in the LLM era.
We provide research directions from both the improvement of existing methods and the expansion of practical applications.
arXiv Detail & Related papers (2024-02-13T07:17:52Z) - A & B == B & A: Triggering Logical Reasoning Failures in Large Language
Models [65.86149763739141]
We introduce LogicAsker, an automatic approach that comprehensively evaluates and improves the logical reasoning abilities of LLMs.
We evaluate LogicAsker on six widely deployed LLMs, including GPT-3, ChatGPT, GPT-4, Bard, Vicuna, and Guanaco.
The results show that test cases from LogicAsker can find logical reasoning failures in different LLMs with a rate of 25% - 94%.
arXiv Detail & Related papers (2024-01-01T13:53:53Z) - InfiMM-Eval: Complex Open-Ended Reasoning Evaluation For Multi-Modal
Large Language Models [50.03163753638256]
Multi-modal Large Language Models (MLLMs) are increasingly prominent in the field of artificial intelligence.
Our benchmark comprises three key reasoning categories: deductive, abductive, and analogical reasoning.
We evaluate a selection of representative MLLMs using this rigorously developed open-ended multi-step elaborate reasoning benchmark.
arXiv Detail & Related papers (2023-11-20T07:06:31Z) - Learning To Teach Large Language Models Logical Reasoning [33.88499005859982]
Large language models (LLMs) have gained enormous attention from both academia and industry.
However, current LLMs still output unreliable content in practical reasoning tasks due to their inherent issues.
arXiv Detail & Related papers (2023-10-13T14:53:06Z) - Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A
Preliminary Study on Writing Assistance [60.40541387785977]
Small foundational models can display remarkable proficiency in tackling diverse tasks when fine-tuned using instruction-driven data.
In this work, we investigate a practical problem setting where the primary focus is on one or a few particular tasks rather than general-purpose instruction following.
Experimental results show that fine-tuning LLaMA on writing instruction data significantly improves its ability on writing tasks.
arXiv Detail & Related papers (2023-05-22T16:56:44Z) - Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study [44.39031420687302]
Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks.
We try to understand this by designing a benchmark to evaluate the structural understanding capabilities of LLMs.
We propose $textitself-augmentation$ for effective structural prompting, such as critical value / range identification.
arXiv Detail & Related papers (2023-05-22T14:23:46Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.