Are Large Language Models Table-based Fact-Checkers?
- URL: http://arxiv.org/abs/2402.02549v1
- Date: Sun, 4 Feb 2024 15:52:59 GMT
- Title: Are Large Language Models Table-based Fact-Checkers?
- Authors: Hangwen Zhang, Qingyi Si, Peng Fu, Zheng Lin, Weiping Wang
- Abstract summary: Table-based Fact Verification (TFV) aims to extract the entailment relation between statements and structured tables.
Existing TFV methods based on small-scaled models suffer from insufficient labeled data and weak zero-shot ability.
Large Language Models (LLMs) have shown powerful zero-shot and in-context learning abilities.
- Score: 18.921379889551687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Table-based Fact Verification (TFV) aims to extract the entailment relation
between statements and structured tables. Existing TFV methods based on
small-scaled models suffer from insufficient labeled data and weak zero-shot
ability. Recently, the appearance of Large Language Models (LLMs) has gained
lots of attraction in research fields. They have shown powerful zero-shot and
in-context learning abilities on several NLP tasks, but their potential on TFV
is still unknown. In this work, we implement a preliminary study about whether
LLMs are table-based fact-checkers. In detail, we design diverse prompts to
explore how the in-context learning can help LLMs in TFV, i.e., zero-shot and
few-shot TFV capability. Besides, we carefully design and construct TFV
instructions to study the performance gain brought by the instruction tuning of
LLMs. Experimental results demonstrate that LLMs can achieve acceptable results
on zero-shot and few-shot TFV with prompt engineering, while instruction-tuning
can stimulate the TFV capability significantly. We also make some valuable
findings about the format of zero-shot prompts and the number of in-context
examples. Finally, we analyze some possible directions to promote the accuracy
of TFV via LLMs, which is beneficial to further research of table reasoning.
Related papers
- Enhancing Temporal Understanding in LLMs for Semi-structured Tables [50.59009084277447]
We conduct a comprehensive analysis of temporal datasets to pinpoint the specific limitations of large language models (LLMs)
Our investigation leads to enhancements in TempTabQA, a dataset specifically designed for temporal temporal question answering.
We introduce a novel approach, C.L.E.A.R. to strengthen LLM capabilities in this domain.
arXiv Detail & Related papers (2024-07-22T20:13:10Z) - LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models [5.455744338342196]
temporal reasoning (TR) is a critical component of artificial intelligence.
Various datasets have been constructed in different ways for evaluating various aspects of TR ability.
Our work proposes a novel approach to design and develop a pipeline for constructing datasets to evaluate the TR ability of LLMs.
arXiv Detail & Related papers (2024-07-07T16:37:06Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Large Language Models Can Learn Temporal Reasoning [11.599570446840547]
We propose TG-LLM, a novel framework towards language-based temporal reasoning.
Instead of reasoning over the original context, we adopt a latent representation, temporal graph (TG)
A synthetic dataset (TGQA) is fully controllable and requires minimal supervision.
arXiv Detail & Related papers (2024-01-12T19:00:26Z) - Assessing the Reliability of Large Language Model Knowledge [78.38870272050106]
Large language models (LLMs) have been treated as knowledge bases due to their strong performance in knowledge probing tasks.
How do we evaluate the capabilities of LLMs to consistently produce factually correct answers?
We propose MOdel kNowledge relIabiliTy scORe (MONITOR), a novel metric designed to directly measure LLMs' factual reliability.
arXiv Detail & Related papers (2023-10-15T12:40:30Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Effective Distillation of Table-based Reasoning Ability from LLMs [23.35522261002175]
Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks.
Their enormous parameter size and extremely high requirements for compute power pose challenges for their practical deployment.
Recent research has revealed that specific capabilities of LLMs, such as numerical reasoning, can be transferred to smaller models through distillation.
arXiv Detail & Related papers (2023-09-22T21:15:28Z) - Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking.
This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z) - Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot
Relation Extractors [11.28397947587596]
Fine-tuning large language models (LLMs) on large-scale instruction-following datasets substantially improves their performance on a wide range of NLP tasks.
However, even advanced instruction-tuned LLMs still fail to outperform small LMs on relation extraction (RE)
We propose QA4RE, a framework that aligns RE with question answering (QA), a predominant task in instruction-tuning datasets.
arXiv Detail & Related papers (2023-05-18T17:48:03Z) - Large Language Models are few(1)-shot Table Reasoners [31.036914270008978]
Large language models (LLMs) are generally excellent few-shot reasoners to solve text reasoning tasks.
In this paper, we aim at understanding how well LLMs can perform on table tasks with few-shot in-context learning.
arXiv Detail & Related papers (2022-10-13T04:08:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.