Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot
Relation Extractors
- URL: http://arxiv.org/abs/2305.11159v1
- Date: Thu, 18 May 2023 17:48:03 GMT
- Title: Aligning Instruction Tasks Unlocks Large Language Models as Zero-Shot
Relation Extractors
- Authors: Kai Zhang, Bernal Jim\'enez Guti\'errez, Yu Su
- Abstract summary: Fine-tuning large language models (LLMs) on large-scale instruction-following datasets substantially improves their performance on a wide range of NLP tasks.
However, even advanced instruction-tuned LLMs still fail to outperform small LMs on relation extraction (RE)
We propose QA4RE, a framework that aligns RE with question answering (QA), a predominant task in instruction-tuning datasets.
- Score: 11.28397947587596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work has shown that fine-tuning large language models (LLMs) on
large-scale instruction-following datasets substantially improves their
performance on a wide range of NLP tasks, especially in the zero-shot setting.
However, even advanced instruction-tuned LLMs still fail to outperform small
LMs on relation extraction (RE), a fundamental information extraction task. We
hypothesize that instruction-tuning has been unable to elicit strong RE
capabilities in LLMs due to RE's low incidence in instruction-tuning datasets,
making up less than 1% of all tasks (Wang et al., 2022). To address this
limitation, we propose QA4RE, a framework that aligns RE with question
answering (QA), a predominant task in instruction-tuning datasets.
Comprehensive zero-shot RE experiments over four datasets with two series of
instruction-tuned LLMs (six LLMs in total) demonstrate that our QA4RE framework
consistently improves LLM performance, strongly verifying our hypothesis and
enabling LLMs to outperform strong zero-shot baselines by a large margin.
Additionally, we provide thorough experiments and discussions to show the
robustness, few-shot effectiveness, and strong transferability of our QA4RE
framework. This work illustrates a promising way of adapting LLMs to
challenging and underrepresented tasks by aligning these tasks with more common
instruction-tuning tasks like QA.
Related papers
- Rethinking VLMs and LLMs for Image Classification [6.550471260627169]
Large Language Models (LLMs) are increasingly being merged with Visual Language Models (VLMs) to enable new capabilities.
We show that, for object and scene recognition, VLMs that do not leverage LLMs can achieve better performance than VLMs that do.
We propose a pragmatic solution: a lightweight fix involving a relatively small LLM that efficiently routes visual tasks to the most suitable model for the task.
arXiv Detail & Related papers (2024-10-03T23:40:21Z) - SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - PPTC-R benchmark: Towards Evaluating the Robustness of Large Language
Models for PowerPoint Task Completion [96.47420221442397]
We construct adversarial user instructions by attacking user instructions at sentence, semantic, and multi-language levels.
We test 3 closed-source and 4 open-source LLMs using a benchmark that incorporates robustness settings.
We find that GPT-4 exhibits the highest performance and strong robustness in our benchmark.
arXiv Detail & Related papers (2024-03-06T15:33:32Z) - TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [73.29220562541204]
We consider harnessing the amazing power of language models (LLMs) to solve our task.
We develop a TAT-LLM language model by fine-tuning LLaMA 2 with the training data generated automatically from existing expert-annotated datasets.
arXiv Detail & Related papers (2024-01-24T04:28:50Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z) - Evaluating the Robustness to Instructions of Large Language Models [6.947956990248856]
Fine-tuning Large Language Models (LLMs) can boost their zero-shot capabilities on novel tasks.
We evaluate six models including Alpaca, Vicuna, WizardLM, and Traditional Task-oriented Models(Flan-T5-XL/XXL, T0++)
We find that the robustness of different scales of FLAN-T5 models to RE instruction is worse than the robustness to QA instruction.
arXiv Detail & Related papers (2023-08-28T04:57:07Z) - Small Language Models Improve Giants by Rewriting Their Outputs [18.025736098795296]
We tackle the problem of leveraging training data to improve the performance of large language models (LLMs) without fine-tuning.
We create a pool of candidates from the LLM through few-shot prompting and we employ a compact model, the LM-corrector (LMCor), specifically trained to merge these candidates to produce an enhanced output.
Experiments on four natural language generation tasks demonstrate that even a small LMCor model (250M) substantially improves the few-shot performance of LLMs (62B), matching and even outperforming standard fine-tuning.
arXiv Detail & Related papers (2023-05-22T22:07:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.