TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data
- URL: http://arxiv.org/abs/2401.13223v3
- Date: Sat, 28 Sep 2024 01:40:33 GMT
- Title: TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data
- Authors: Fengbin Zhu, Ziyang Liu, Fuli Feng, Chao Wang, Moxin Li, Tat-Seng Chua,
- Abstract summary: We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
- Score: 73.29220562541204
- License:
- Abstract: In this work, we address question answering (QA) over a hybrid of tabular and textual data that are very common content on the Web (e.g. SEC filings), where discrete reasoning capabilities are often required. Recently, large language models (LLMs) like GPT-4 have demonstrated strong multi-step reasoning capabilities. We then consider harnessing the amazing power of LLMs to solve our task. We abstract a Step-wise Pipeline for tabular and textual QA, which consists of three key steps, including Extractor, Reasoner and Executor, and initially design an instruction to instantiate the pipeline and validate that GPT-4 outperforms all existing methods. However, utilizing an online LLM like GPT-4 holds various challenges in terms of cost, latency, and data security risk, which motivates us to specialize smaller LLMs in this task. We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets following the Step-wise Pipeline. The experimental results have verified that our TAT-LLM model can outperform all baseline models, including the previous best fine-tuned models and very large-scale LLMs like GPT-4 on FinQA, TAT-QA and TAT-DQA benchmarks.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning [35.03338699349037]
We propose a novel in-context learning framework, FeatLLM, which employs Large Language Models as feature engineers.
FeatLLM generates high-quality rules, significantly (10% on average) outperforming alternatives such as TabLLM and STUNT.
arXiv Detail & Related papers (2024-04-15T06:26:08Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator [63.762209407570715]
Genixer is a comprehensive data generation pipeline consisting of four key steps.
A synthetic VQA-like dataset trained with LLaVA1.5 enhances performance on 10 out of 12 multimodal benchmarks.
MLLMs trained with task-specific datasets can surpass GPT-4V in generating complex instruction tuning data.
arXiv Detail & Related papers (2023-12-11T09:44:41Z) - Benchmarking the Abilities of Large Language Models for RDF Knowledge
Graph Creation and Comprehension: How Well Do LLMs Speak Turtle? [0.0]
Large Language Models (LLMs) are advancing at a rapid pace, with significant improvements at natural language processing and coding tasks.
To evaluate the proficiency of various LLMs, we created a set of five tasks that probe their ability to parse, understand, analyze, and create knowledge graphs serialized in Turtle syntax.
The evaluation encompassed four commercially available LLMs - GPT-3.5, GPT-4, Claude 1.3, and Claude 2.0, as well as two freely accessible offline models, GPT4All Vicuna and GPT4All Falcon 13B.
arXiv Detail & Related papers (2023-09-29T10:36:04Z) - Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? [49.688233418425995]
Struc-Bench is a comprehensive benchmark featuring prominent Large Language Models (LLMs)
We propose two innovative metrics, P-Score (Prompting Score) and H-Score (Heuristical Score)
Our experiments show that applying our structure-aware fine-tuning to LLaMA-7B leads to substantial performance gains.
arXiv Detail & Related papers (2023-09-16T11:31:58Z) - LLM-Pruner: On the Structural Pruning of Large Language Models [65.02607075556742]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.
We tackle the compression of LLMs within the bound of two constraints: being task-agnostic and minimizing the reliance on the original training dataset.
Our method, named LLM-Pruner, adopts structural pruning that selectively removes non-critical coupled structures.
arXiv Detail & Related papers (2023-05-19T12:10:53Z) - Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot
Relation Extractors [11.28397947587596]
Fine-tuning large language models (LLMs) on large-scale instruction-following datasets substantially improves their performance on a wide range of NLP tasks.
However, even advanced instruction-tuned LLMs still fail to outperform small LMs on relation extraction (RE)
We propose QA4RE, a framework that aligns RE with question answering (QA), a predominant task in instruction-tuning datasets.
arXiv Detail & Related papers (2023-05-18T17:48:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.