Training Large Language Models to be Better Rule Followers
- URL: http://arxiv.org/abs/2502.11525v1
- Date: Mon, 17 Feb 2025 07:54:50 GMT
- Title: Training Large Language Models to be Better Rule Followers
- Authors: Yi Hu, Shijia Kang, Haotong Yang, Haotian Xu, Muhan Zhang,
- Abstract summary: Large language models (LLMs) have shown impressive performance across a wide range of tasks.
Current training methods fail to leverage these rules effectively.
We propose Meta Rule-Following Fine-Tuning (Meta-RFFT) to enhance the cross-task transferability of rule-following abilities.
- Score: 23.958458849973248
- License:
- Abstract: Large language models (LLMs) have shown impressive performance across a wide range of tasks. However, they often exhibit unexpected failures in seemingly straightforward tasks, suggesting a reliance on case-based reasoning rather than rule-based reasoning. While the vast training corpus of LLMs contains numerous textual "rules", current training methods fail to leverage these rules effectively. Crucially, the relationships between these "rules" and their corresponding "instances" are not explicitly modeled. As a result, while LLMs can often recall rules with ease, they fail to apply these rules strictly and consistently in relevant reasoning scenarios. In this paper, we investigate the rule-following capabilities of LLMs and propose Meta Rule-Following Fine-Tuning (Meta-RFFT) to enhance the cross-task transferability of rule-following abilities. We first construct a dataset of 88 tasks requiring following rules, encompassing diverse reasoning domains. We demonstrate through extensive experiments that models trained on large-scale rule-following tasks are better rule followers, outperforming the baselines in both downstream fine-tuning and few-shot prompting scenarios. This highlights the cross-task transferability of models with the aid of Meta-RFFT. Furthermore, we examine the influence of factors such as dataset size, rule formulation, and in-context learning.
Related papers
- The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Learning Capabilities [51.594836904623534]
We investigate whether instruction-tuned models possess fundamentally different capabilities from base models that are prompted using in-context examples.
We show that the performance of instruction-tuned models is significantly correlated with the in-context performance of their base counterparts.
Specifically, we extend this understanding to instruction-tuned models, suggesting that their pretraining data similarly sets a limiting boundary on the tasks they can solve.
arXiv Detail & Related papers (2025-01-15T10:57:55Z) - RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios [58.90106984375913]
RuleArena is a novel and challenging benchmark designed to evaluate the ability of large language models (LLMs) to follow complex, real-world rules in reasoning.
Covering three practical domains -- airline baggage fees, NBA transactions, and tax regulations -- RuleArena assesses LLMs' proficiency in handling intricate natural language instructions.
arXiv Detail & Related papers (2024-12-12T06:08:46Z) - RNR: Teaching Large Language Models to Follow Roles and Rules [153.6596303205894]
We propose model, an automated data generation pipeline that generates diverse roles and rules from existing IFT instructions.
This data can then be used to train models that follow complex system prompts.
Our framework significantly improves role and rule following capability in large language models.
arXiv Detail & Related papers (2024-09-10T06:07:32Z) - Symbolic Working Memory Enhances Language Models for Complex Rule Application [87.34281749422756]
Large Language Models (LLMs) have shown remarkable reasoning performance but struggle with multi-step deductive reasoning.
We propose augmenting LLMs with external working memory and introduce a neurosymbolic framework for rule application.
Our framework iteratively performs symbolic rule grounding and LLM-based rule implementation.
arXiv Detail & Related papers (2024-08-24T19:11:54Z) - Beyond Instruction Following: Evaluating Inferential Rule Following of Large Language Models [25.337295202341608]
Large Language Models (LLMs) are supposed to be controlled and guided by in real-world scenarios to be safe, accurate, and intelligent.
Previous studies that try to evaluate the inferential rule-following capability of LLMs fail to distinguish the inferential rule-following scenarios from the instruction-following scenarios.
This paper first clarifies the concept of inferential rule-following and proposes a comprehensive benchmark, RuleBench, to evaluate a diversified range of inferential rule-following abilities.
arXiv Detail & Related papers (2024-07-11T12:26:55Z) - Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs [87.34281749422756]
Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks.
However, their mastery of underlying inferential rules still falls short of human capabilities.
We propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic.
arXiv Detail & Related papers (2024-02-18T03:38:51Z) - Can LLMs Follow Simple Rules? [28.73820874333199]
Rule-following Language Evaluation Scenarios (RuLES) is a framework for measuring rule-following ability in Large Language Models.
RuLES consists of 14 simple text scenarios in which the model is instructed to obey various rules while interacting with the user.
We show that almost all current models struggle to follow scenario rules, even on straightforward test cases.
arXiv Detail & Related papers (2023-11-06T08:50:29Z) - ChatRule: Mining Logical Rules with Large Language Models for Knowledge
Graph Reasoning [107.61997887260056]
We propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs.
Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs.
To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs.
arXiv Detail & Related papers (2023-09-04T11:38:02Z) - RuleBert: Teaching Soft Rules to Pre-trained Language Models [21.69870624809201]
We introduce a classification task where, given facts and soft rules, the PLM should return a prediction with a probability for a given hypothesis.
We propose a revised loss function that enables the PLM to learn how to predict precise probabilities for the task.
Our evaluation results show that the resulting fine-tuned models achieve very high performance, even on logical rules that were unseen at training.
arXiv Detail & Related papers (2021-09-24T16:19:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.