Related papers: Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers

URL: http://arxiv.org/abs/2506.15674v1
Date: Wed, 18 Jun 2025 17:57:01 GMT
Title: Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
Authors: Tommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, Seong Joon Oh,
Abstract summary: We study privacy leakage in the reasoning traces of large reasoning models used as personal agents.<n>We show that reasoning traces frequently contain sensitive user data, which can be extracted via prompt injections or accidentally leak into outputs.<n>We argue that safety efforts must extend to the model's internal thinking, not just its outputs.
Score: 36.044522516005884
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study privacy leakage in the reasoning traces of large reasoning models used as personal agents. Unlike final outputs, reasoning traces are often assumed to be internal and safe. We challenge this assumption by showing that reasoning traces frequently contain sensitive user data, which can be extracted via prompt injections or accidentally leak into outputs. Through probing and agentic evaluations, we demonstrate that test-time compute approaches, particularly increased reasoning steps, amplify such leakage. While increasing the budget of those test-time compute approaches makes models more cautious in their final answers, it also leads them to reason more verbosely and leak more in their own thinking. This reveals a core tension: reasoning improves utility but enlarges the privacy attack surface. We argue that safety efforts must extend to the model's internal thinking, not just its outputs.

Related papers

Does More Inference-Time Compute Really Help Robustness? [50.47666612618054]
We show that small-scale, open-source models can benefit from inference-time scaling.<n>We identify an important security risk, intuitively motivated and empirically verified as an inverse scaling law.<n>We urge practitioners to carefully weigh these subtle trade-offs before applying inference-time scaling in security-sensitive, real-world applications.
arXiv Detail & Related papers (2025-07-21T18:08:38Z)
Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know? [7.423494663010787]
Reasoning language models have set state-of-the-art (SOTA) records on many challenging benchmarks.<n>Like previous language models, reasoning models are prone to generating confident, plausible responses that are incorrect.<n>Knowing when and how much to trust these models is critical to the safe deployment of reasoning models in real-world applications.
arXiv Detail & Related papers (2025-06-22T21:46:42Z)
On Reasoning Strength Planning in Large Reasoning Models [50.61816666920207]
We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation.<n>We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model.<n>Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors.
arXiv Detail & Related papers (2025-06-10T02:55:13Z)
Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning Models [103.03315678501546]
Extending thinking traces using prompts like "Wait" or "Let me rethink" can improve performance.<n>This raises a natural question: Does thinking more at test-time truly lead to better reasoning?<n>We show a consistent pattern of initial performance improvements from additional thinking followed by a decline, due to "overthinking"
arXiv Detail & Related papers (2025-06-04T17:55:09Z)
Internal Bias in Reasoning Models leads to Overthinking [58.817405319722596]
We show for the first time that overthinking in reasoning models may stem from their internal bias towards input texts.<n>By masking out the original input section, the affect of internal bias can be effectively alleviated and the reasoning length could be reduced by 31%-53%.
arXiv Detail & Related papers (2025-05-22T09:35:52Z)
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models [42.407188124841234]
Landscape of thoughts is a tool to inspect the reasoning paths of chain-of-thought on any multi-choice dataset.<n>It distinguishes between strong and weak models, correct and incorrect answers, as well as different reasoning tasks.<n>It also uncovers undesirable reasoning patterns, such as low consistency and high uncertainty.
arXiv Detail & Related papers (2025-03-28T06:09:51Z)
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models [16.407923457296235]
This work investigates how reasoning length is embedded in the hidden representations of reasoning models.<n>We introduce textbftextitThinkEdit, a simple yet effective weight-editing approach to mitigate the issue of overly short reasoning.<n>With changes to only 0.2% of the model's parameters, textbftextitThinkEdit effectively reduces overly short reasoning and yields notable accuracy gains.
arXiv Detail & Related papers (2025-03-27T23:53:45Z)
Navigating the OverKill in Large Language Models [84.62340510027042]
We investigate the factors for overkill by exploring how models handle and determine the safety of queries. Our findings reveal the presence of shortcuts within models, leading to an over-attention of harmful words like 'kill' and prompts emphasizing safety will exacerbate overkill. We introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy, to alleviate this phenomenon.
arXiv Detail & Related papers (2024-01-31T07:26:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.