Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
- URL: http://arxiv.org/abs/2506.04245v1
- Date: Thu, 29 May 2025 21:26:21 GMT
- Title: Contextual Integrity in LLMs via Reasoning and Reinforcement Learning
- Authors: Guangchen Lan, Huseyin A. Inan, Sahar Abdelnabi, Janardhan Kulkarni, Lukas Wutschitz, Reza Shokri, Christopher G. Brinton, Robert Sim,
- Abstract summary: We develop a reinforcement learning framework that instills in models the reasoning necessary to achieve contextual integrity.<n>We show that our method substantially reduces inappropriate information disclosure while maintaining task performance.
- Score: 41.47162246075031
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As the era of autonomous agents making decisions on behalf of users unfolds, ensuring contextual integrity (CI) -- what is the appropriate information to share while carrying out a certain task -- becomes a central question to the field. We posit that CI demands a form of reasoning where the agent needs to reason about the context in which it is operating. To test this, we first prompt LLMs to reason explicitly about CI when deciding what information to disclose. We then extend this approach by developing a reinforcement learning (RL) framework that further instills in models the reasoning necessary to achieve CI. Using a synthetic, automatically created, dataset of only $\sim700$ examples but with diverse contexts and information disclosure norms, we show that our method substantially reduces inappropriate information disclosure while maintaining task performance across multiple model sizes and families. Importantly, improvements transfer from this synthetic dataset to established CI benchmarks such as PrivacyLens that has human annotations and evaluates privacy leakage of AI assistants in actions and tool calls.
Related papers
- Teaching Language Models To Gather Information Proactively [53.85419549904644]
Large language models (LLMs) are increasingly expected to function as collaborative partners.<n>In this work, we introduce a new task paradigm: proactive information gathering.<n>We design a scalable framework that generates partially specified, real-world tasks, masking key information.<n>Within this setup, our core innovation is a reinforcement finetuning strategy that rewards questions that elicit genuinely new, implicit user information.
arXiv Detail & Related papers (2025-07-28T23:50:09Z) - Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth [21.672923905771576]
Large language models (LLMs) by crowdsourcing workers pose a challenge to datasets intended to reflect human input.<n>We propose a training-free scoring mechanism with theoretical guarantees under a crowdsourcing model that accounts for LLM collusion.
arXiv Detail & Related papers (2025-06-08T04:38:39Z) - InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation [63.55258191625131]
InfoDeepSeek is a new benchmark for assessing agentic information seeking in real-world, dynamic web environments.<n>We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity.<n>We develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes.
arXiv Detail & Related papers (2025-05-21T14:44:40Z) - Retrieval Augmented Generation for Topic Modeling in Organizational Research: An Introduction with Empirical Demonstration [0.0]
This paper introduces Agentic Retrieval-Augmented Generation (Agentic RAG) as a method for topic modeling with LLMs.<n>It integrates three key components: (1) retrieval, enabling automatized access to external data beyond an LLM's pre-trained knowledge; (2) generation, leveraging LLM capabilities for text synthesis; and (3) agent-driven learning, iteratively refining retrieval and query formulation processes.<n>Our findings demonstrate that the approach is more efficient, interpretable and at the same time achieves higher reliability and validity in comparison to the standard machine learning approach.
arXiv Detail & Related papers (2025-02-28T11:25:11Z) - Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Answer When Needed, Forget When Not: Language Models Pretend to Forget via In-Context Knowledge Unlearning [26.861562920084264]
Large language models (LLMs) are applied across diverse domains.<n>We propose a novel method termed in-context knowledge unlearning''<n>Our method achieves up to 95% forget accuracy while retaining 80% of unrelated knowledge.
arXiv Detail & Related papers (2024-10-01T04:13:25Z) - Privacy Policy Analysis through Prompt Engineering for LLMs [3.059256166047627]
PAPEL (Privacy Policy Analysis through Prompt Engineering for LLMs) is a framework harnessing the power of Large Language Models (LLMs) to automate the analysis of privacy policies.
It aims to streamline the extraction, annotation, and summarization of information from these policies, enhancing their accessibility and comprehensibility without requiring additional model training.
We demonstrate the effectiveness of PAPEL with two applications: (i) annotation and (ii) contradiction analysis.
arXiv Detail & Related papers (2024-09-23T10:23:31Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.