Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
- URL: http://arxiv.org/abs/2407.00075v2
- Date: Tue, 01 Oct 2024 20:42:41 GMT
- Title: Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
- Authors: Anton Xue, Avishree Khare, Rajeev Alur, Surbhi Goel, Eric Wong,
- Abstract summary: We study how to subvert large language models (LLMs) from following prompt-specified rules.
We prove that although LLMs can faithfully follow such rules, maliciously crafted prompts can mislead even idealized, theoretically constructed models.
- Score: 20.057611113206324
- License:
- Abstract: We study how to subvert large language models (LLMs) from following prompt-specified rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form ``if $P$ and $Q$, then $R$'' for some propositions $P$, $Q$, and $R$. We prove that although LLMs can faithfully follow such rules, maliciously crafted prompts can mislead even idealized, theoretically constructed models. Empirically, we find that the reasoning behavior of LLMs aligns with that of our theoretical constructions, and popular attack algorithms find adversarial prompts with characteristics predicted by our theory. Our logic-based framework provides a novel perspective for mechanistically understanding the behavior of LLMs in rule-based settings such as jailbreak attacks.
Related papers
- Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus [13.276829763453433]
Large language models (LLMs) are capable of solving a wide range of tasks, yet they have struggled with reasoning.
We propose $textbfAdditional Logic Training (ALT)$, which aims to enhance LLMs' reasoning capabilities by program-generated logical reasoning samples.
arXiv Detail & Related papers (2024-11-19T13:31:53Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs [87.34281749422756]
Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks.
However, their mastery of underlying inferential rules still falls short of human capabilities.
We propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic.
arXiv Detail & Related papers (2024-02-18T03:38:51Z) - Chain of Logic: Rule-Based Reasoning with Large Language Models [10.017812995997753]
Rule-based reasoning enables us to draw conclusions by accurately applying a rule to a set of facts.
We introduce a new prompting method, Chain of Logic, which elicits rule-based reasoning through decomposition and recomposition.
We evaluate chain of logic across eight rule-based reasoning tasks involving three distinct compositional rules from the LegalBench benchmark.
arXiv Detail & Related papers (2024-02-16T01:54:43Z) - Large Language Models can Learn Rules [106.40747309894236]
We present Hypotheses-to-Theories (HtT), a framework that learns a rule library for reasoning with large language models (LLMs)
Experiments on relational reasoning, numerical reasoning and concept learning problems show that HtT improves existing prompting methods.
The learned rules are also transferable to different models and to different forms of the same problem.
arXiv Detail & Related papers (2023-10-10T23:07:01Z) - ChatRule: Mining Logical Rules with Large Language Models for Knowledge
Graph Reasoning [107.61997887260056]
We propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs.
Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs.
To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs.
arXiv Detail & Related papers (2023-09-04T11:38:02Z) - Automating Defeasible Reasoning in Law [0.0]
We study defeasible reasoning in rule-based systems, in particular about legal norms and contracts.
We identify rule modifier that specify how rules interact and how they can be overridden.
We then define rule transformations that eliminate these modifier, leading to a translation of rules to formulas.
arXiv Detail & Related papers (2022-05-15T17:14:15Z) - Learning Symbolic Rules for Reasoning in Quasi-Natural Language [74.96601852906328]
We build a rule-based system that can reason with natural language input but without the manual construction of rules.
We propose MetaQNL, a "Quasi-Natural" language that can express both formal logic and natural language sentences.
Our approach achieves state-of-the-art accuracy on multiple reasoning benchmarks.
arXiv Detail & Related papers (2021-11-23T17:49:00Z) - RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs [91.71504177786792]
This paper studies learning logic rules for reasoning on knowledge graphs.
Logic rules provide interpretable explanations when used for prediction as well as being able to generalize to other tasks.
Existing methods either suffer from the problem of searching in a large search space or ineffective optimization due to sparse rewards.
arXiv Detail & Related papers (2020-10-08T14:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.