Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
- URL: http://arxiv.org/abs/2407.00075v2
- Date: Tue, 01 Oct 2024 20:42:41 GMT
- Title: Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
- Authors: Anton Xue, Avishree Khare, Rajeev Alur, Surbhi Goel, Eric Wong,
- Abstract summary: We study how to subvert large language models (LLMs) from following prompt-specified rules.
We prove that although LLMs can faithfully follow such rules, maliciously crafted prompts can mislead even idealized, theoretically constructed models.
- Score: 20.057611113206324
- License:
- Abstract: We study how to subvert large language models (LLMs) from following prompt-specified rules. We model rule-following as inference in propositional Horn logic, a mathematical system in which rules have the form ``if $P$ and $Q$, then $R$'' for some propositions $P$, $Q$, and $R$. We prove that although LLMs can faithfully follow such rules, maliciously crafted prompts can mislead even idealized, theoretically constructed models. Empirically, we find that the reasoning behavior of LLMs aligns with that of our theoretical constructions, and popular attack algorithms find adversarial prompts with characteristics predicted by our theory. Our logic-based framework provides a novel perspective for mechanistically understanding the behavior of LLMs in rule-based settings such as jailbreak attacks.
Related papers
- Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus [13.276829763453433]
Large language models (LLMs) are capable of solving a wide range of tasks, yet they have struggled with reasoning.
We propose $textbfAdditional Logic Training (ALT)$, which aims to enhance LLMs' reasoning capabilities by program-generated logical reasoning samples.
arXiv Detail & Related papers (2024-11-19T13:31:53Z) - LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models [87.49676980090555]
Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities.
We introduce LogicGame, a novel benchmark designed to evaluate the comprehensive rule understanding, execution, and planning capabilities of LLMs.
arXiv Detail & Related papers (2024-08-28T13:16:41Z) - LogicBench: Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models [52.03659714625452]
Recently developed large language models (LLMs) have been shown to perform remarkably well on a wide range of language understanding tasks.
But, can they really "reason" over the natural language?
This question has been receiving significant research attention and many reasoning skills such as commonsense, numerical, and qualitative have been studied.
arXiv Detail & Related papers (2024-04-23T21:08:49Z) - Do Large Language Models Understand Logic or Just Mimick Context? [14.081178100662163]
This paper investigates the reasoning capabilities of large language models (LLMs) on two logical reasoning datasets.
It is found that LLMs do not truly understand logical rules; rather, in-context learning has simply enhanced the likelihood of these models arriving at the correct answers.
arXiv Detail & Related papers (2024-02-19T12:12:35Z) - Can LLMs Reason with Rules? Logic Scaffolding for Stress-Testing and Improving LLMs [87.34281749422756]
Large language models (LLMs) have achieved impressive human-like performance across various reasoning tasks.
However, their mastery of underlying inferential rules still falls short of human capabilities.
We propose a logic scaffolding inferential rule generation framework, to construct an inferential rule base, ULogic.
arXiv Detail & Related papers (2024-02-18T03:38:51Z) - Can LLMs Follow Simple Rules? [28.73820874333199]
Rule-following Language Evaluation Scenarios (RuLES) is a framework for measuring rule-following ability in Large Language Models.
RuLES consists of 14 simple text scenarios in which the model is instructed to obey various rules while interacting with the user.
We show that almost all current models struggle to follow scenario rules, even on straightforward test cases.
arXiv Detail & Related papers (2023-11-06T08:50:29Z) - Reasoning with Language Model is Planning with World Model [27.24144881796878]
Large language models (LLMs) have shown remarkable reasoning capabilities.
LLMs lack an internal $textitworld model$ to predict the world.
We propose a new LLM reasoning framework, $underlineR$easoning vi$underlinea$ $underlineP$lanning $textbf(RAP)$.
arXiv Detail & Related papers (2023-05-24T10:28:28Z) - Logic-LM: Empowering Large Language Models with Symbolic Solvers for
Faithful Logical Reasoning [101.26814728062065]
Large Language Models (LLMs) have shown human-like reasoning abilities but still struggle with complex logical problems.
This paper introduces a novel framework, Logic-LM, which integrates LLMs with symbolic solvers to improve logical problem-solving.
arXiv Detail & Related papers (2023-05-20T22:25:38Z) - Learning Symbolic Rules for Reasoning in Quasi-Natural Language [74.96601852906328]
We build a rule-based system that can reason with natural language input but without the manual construction of rules.
We propose MetaQNL, a "Quasi-Natural" language that can express both formal logic and natural language sentences.
Our approach achieves state-of-the-art accuracy on multiple reasoning benchmarks.
arXiv Detail & Related papers (2021-11-23T17:49:00Z) - RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs [91.71504177786792]
This paper studies learning logic rules for reasoning on knowledge graphs.
Logic rules provide interpretable explanations when used for prediction as well as being able to generalize to other tasks.
Existing methods either suffer from the problem of searching in a large search space or ineffective optimization due to sparse rewards.
arXiv Detail & Related papers (2020-10-08T14:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.