Related papers: RAPTOR: Ridge-Adaptive Logistic Probes

RAPTOR: Ridge-Adaptive Logistic Probes

URL: http://arxiv.org/abs/2602.00158v2
Date: Wed, 04 Feb 2026 02:32:18 GMT
Title: RAPTOR: Ridge-Adaptive Logistic Probes
Authors: Ziqi Gao, Yaotian Zhu, Qingcheng Zeng, Xu Zhao, Ziqing Wang, Feng Ruan, Kaize Ding,
Abstract summary: We propose RAPTOR, a simple L2-regularized logistic probe with validation-tuned ridge strength.<n>RAPTOR matches or exceeds strong baselines in accuracy while achieving competitive directional stability.<n>We provide a mechanistic characterization of ridge logistic regression in an idealized Gaussian teacher-student model in the high-dimensional few-shot regime.
Score: 37.64383880338739
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Probing studies what information is encoded in a frozen LLM's layer representations by training a lightweight predictor on top of them. Beyond analysis, probes are often used operationally in probe-then-steer pipelines: a learned concept vector is extracted from a probe and injected via additive activation steering by adding it to a layer representation during the forward pass. The effectiveness of this pipeline hinges on estimating concept vectors that are accurate, directionally stable under ablation, and inexpensive to obtain. Motivated by these desiderata, we propose RAPTOR (Ridge-Adaptive Logistic Probe), a simple L2-regularized logistic probe whose validation-tuned ridge strength yields concept vectors from normalized weights. Across extensive experiments on instruction-tuned LLMs and human-written concept datasets, RAPTOR matches or exceeds strong baselines in accuracy while achieving competitive directional stability and substantially lower training cost; these quantitative results are supported by qualitative downstream steering demonstrations. Finally, using the Convex Gaussian Min-max Theorem (CGMT), we provide a mechanistic characterization of ridge logistic regression in an idealized Gaussian teacher-student model in the high-dimensional few-shot regime, explaining how penalty strength mediates probe accuracy and concept-vector stability and yielding structural predictions that qualitatively align with trends observed on real LLM embeddings.

Related papers

Binary Flow Matching: Prediction-Loss Space Alignment for Robust Learning [23.616336786063552]
Flow matching has emerged as a powerful framework for generative modeling.<n>We identify a latent structural mismatch that arises when it is coupled with velocity-based objectives.<n>We prove that re-aligning the objective to the signal space eliminates the singular weighting.
arXiv Detail & Related papers (2026-02-11T02:02:30Z)
Adaptive recurrent flow map operator learning for reaction diffusion dynamics [0.9137554315375919]
We develop an operator learner with adaptive recurrent training (DDOL-ART) using a robust recurrent strategy with lightweight validation milestones.<n>DDOL-ART learns one-step operators that remain stable under long rollouts and generalize zero-shot to strong shifts.<n>It is several-fold faster than a physics-based numerical-loss operator learner (NLOL) under matched settings.
arXiv Detail & Related papers (2026-02-10T07:33:13Z)
Latent-Space Contrastive Reinforcement Learning for Stable and Efficient LLM Reasoning [16.244366307890832]
We propose textbfDeepLatent Reasoning (DLR), a latent-space bidirectional contrastive reinforcement learning framework.<n>This framework shifts the trial-and-error cost from expensive token-level full sequence generation to the continuous latent manifold.<n> Experiments demonstrate that DLR achieves more stable training convergence, supports longer-horizon reasoning chains, and facilitates the sustainable accumulation of reasoning capabilities.
arXiv Detail & Related papers (2026-01-24T03:18:22Z)
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers [67.02076505996284]
We study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other.<n>Our results shed light on the algorithmic biases of pretrained transformers and offer conceptual guidelines for data-driven control of their learned behaviors.
arXiv Detail & Related papers (2025-12-21T08:10:26Z)
PACR: Progressively Ascending Confidence Reward for LLM Reasoning [55.06373646059141]
We propose Progressively Ascending Confidence Reward (PACR)<n>PACR is a dense, model-intrinsic reward computed directly from the model's evolving belief in the correct answer.<n>Our results suggest that dense, model-intrinsic shaping signals can make RLVR training more effective and reliable.
arXiv Detail & Related papers (2025-10-25T11:25:35Z)
Training-Free Stein Diffusion Guidance: Posterior Correction for Sampling Beyond High-Density Regions [46.59494117137471]
Training free diffusion guidance provides a flexible way to leverage off-the-shelf classifiers without additional training.<n>We introduce Stein Diffusion Guidance (SDG), a novel training-free framework grounded in a surrogate SOC objective.<n>Experiments on molecular low-density sampling tasks suggest that SDG consistently surpasses standard training-free guidance methods.
arXiv Detail & Related papers (2025-07-07T21:14:27Z)
Contrast & Compress: Learning Lightweight Embeddings for Short Trajectories [11.6132604160666]
We propose a novel framework for learning fixed-dimensional embeddings for short trajectories by leveraging a Transformer encoder.<n>We analyze the influence of Cosine and FFT-based similarity metrics within the contrastive learning paradigm.<n>Our empirical evaluation on the Argoverse 2 dataset demonstrates that embeddings shaped by Cosine similarity objectives yield superior clustering of trajectories.
arXiv Detail & Related papers (2025-06-03T07:53:04Z)
Next Token Perception Score: Analytical Assessment of your LLM Perception Skills [12.093755170926762]
Next Token Perception Score (NTPS) is a score derived under a linear setting that measures the overlap between autoregressive and perception feature subspaces.<n>We show that NTPS increases following low-rank adaptation (LoRA) fine-tuning, especially in large models.<n>Our results offer both theoretical insights and practical tools for analytically assessing perception skills.
arXiv Detail & Related papers (2025-05-22T17:18:51Z)
GRANP: A Graph Recurrent Attentive Neural Process Model for Vehicle Trajectory Prediction [3.031375888004876]
We propose a novel model named Graph Recurrent Attentive Neural Process (GRANP) for vehicle trajectory prediction. GRANP contains an encoder with deterministic and latent paths, and a decoder for prediction. We show that GRANP achieves state-of-the-art results and can efficiently quantify uncertainties.
arXiv Detail & Related papers (2024-04-09T05:51:40Z)
Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization [87.21285093582446]
Diffusion Generative Flow Samplers (DGFS) is a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments. Our method takes inspiration from the theory developed for generative flow networks (GFlowNets)
arXiv Detail & Related papers (2023-10-04T09:39:05Z)
Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation [49.44309457870649]
Layer-wise Feedback feedback (LFP) is a novel training principle for neural network-like predictors.<n>LFP decomposes a reward to individual neurons based on their respective contributions.<n>Our method then implements a greedy reinforcing approach helpful parts of the network and weakening harmful ones.
arXiv Detail & Related papers (2023-08-23T10:48:28Z)
Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling. We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z)
Reinforcement Learning for Low-Thrust Trajectory Design of Interplanetary Missions [77.34726150561087]
This paper investigates the use of reinforcement learning for the robust design of interplanetary trajectories in presence of severe disturbances. An open-source implementation of the state-of-the-art algorithm Proximal Policy Optimization is adopted. The resulting Guidance and Control Network provides both a robust nominal trajectory and the associated closed-loop guidance law.
arXiv Detail & Related papers (2020-08-19T15:22:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.