Related papers: LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS

LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS

URL: http://arxiv.org/abs/2511.02089v1
Date: Mon, 03 Nov 2025 22:00:37 GMT
Title: LLM Probing with Contrastive Eigenproblems: Improving Understanding and Applicability of CCS
Authors: Stefan F. Schouten, Peter Bloem,
Abstract summary: We argue that what should be optimized for, is relative contrast consistency.<n>We reformulate CCS as an eigenproblem, yielding closed-form solutions with interpretable eigenvalues and natural extensions to multiple variables.<n>Our results suggest that relativizing contrast consistency not only improves our understanding of CCS but also opens pathways for broader probing and mechanistic interpretability methods.
Score: 0.17188280334580197
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contrast-Consistent Search (CCS) is an unsupervised probing method able to test whether large language models represent binary features, such as sentence truth, in their internal activations. While CCS has shown promise, its two-term objective has been only partially understood. In this work, we revisit CCS with the aim of clarifying its mechanisms and extending its applicability. We argue that what should be optimized for, is relative contrast consistency. Building on this insight, we reformulate CCS as an eigenproblem, yielding closed-form solutions with interpretable eigenvalues and natural extensions to multiple variables. We evaluate these approaches across a range of datasets, finding that they recover similar performance to CCS, while avoiding problems around sensitivity to random initialization. Our results suggest that relativizing contrast consistency not only improves our understanding of CCS but also opens pathways for broader probing and mechanistic interpretability methods.

Related papers

Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure [58.89643769707751]
We study latent chain-of-thought as a manipulable causal process in representation space.<n>We find that latent-step budgets behave less like homogeneous extra depth and more like staged functionality with non-local routing.<n>These results motivate mode-conditional and stability-aware analyses as more reliable tools for interpreting and improving latent reasoning systems.
arXiv Detail & Related papers (2026-02-09T15:25:12Z)
Learning Consistent Causal Abstraction Networks [14.952578725545344]
Causal artificial intelligence aims to enhance explainability, robustness, and trustworthiness in AI by leveraging structural causal models (SCMs)<n>We tackle the consistent abstraction network (CAN)<n>Experiments show competitive learning on synthetic data, and successful recovery of diverse CAN structures.
arXiv Detail & Related papers (2026-02-02T16:16:29Z)
Concept Regions Matter: Benchmarking CLIP with a New Cluster-Importance Approach [20.898059440239603]
Cluster-based Concept Importance (CCI) is a novel interpretability method.<n>CCI sets a new state of the art on faithfulness benchmarks.<n>We present a comprehensive evaluation of eighteen CLIP variants.
arXiv Detail & Related papers (2025-11-17T05:01:24Z)
Self-Calibrated Consistency can Fight Back for Adversarial Robustness in Vision-Language Models [31.920092341939593]
Self-Calibrated Consistency is an effective test-time defense against adversarial attacks.<n> SCC consistently improves the zero-shot robustness of CLIP while maintaining accuracy.<n>These findings highlight the great potential of establishing an adversarially robust paradigm from CLIP.
arXiv Detail & Related papers (2025-10-26T18:37:12Z)
The Causal Abstraction Network: Theory and Learning [14.952578725545344]
Causal artificial intelligence aims to enhance explainability, robustness, and trustworthiness in AI by leveraging structural causal models (SCMs)<n>Recent advances formalize network sheaves of causal knowledge.<n>We introduce the causal abstraction network (CAN), a specific instance of such sheaves where (i)s are Gaussian, (ii) maps are transposes of constructive linear abstractions.
arXiv Detail & Related papers (2025-09-25T07:48:25Z)
SIM-CoT: Supervised Implicit Chain-of-Thought [108.30049193668083]
Implicit Chain-of-Thought (CoT) methods offer a token-efficient alternative to explicit CoT reasoning in Large Language Models.<n>We identify a core latent instability issue when scaling the computational budget of implicit CoT.<n>We propose SIM-CoT, a plug-and-play training module that introduces step-level supervision to stabilize and enrich the latent reasoning space.
arXiv Detail & Related papers (2025-09-24T17:01:32Z)
Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning [65.75756724642932]
In incomplete multi-view clustering, missing data induce prototype shifts within views and semantic inconsistencies across views.<n>We propose an IMVC framework, imputation- and alignment-free for consensus semantics learning (FreeCSL)<n>FreeCSL achieves more confident and robust assignments on IMVC task, compared to state-of-the-art competitors.
arXiv Detail & Related papers (2025-05-16T12:37:10Z)
Fine-Tuning on Diverse Reasoning Chains Drives Within-Inference CoT Refinement in LLMs [63.36637269634553]
We introduce a novel approach where LLMs are fine-tuned to generate a sequence of Diverse Chains of Thought (DCoT) within a single inference step.<n>We show that fine-tuning on DCoT improves performance over the CoT baseline across model families and scales.<n>Our work is also significant because both quantitative analyses and manual evaluations reveal the observed gains stem from the models' ability to refine an initial reasoning chain.
arXiv Detail & Related papers (2024-07-03T15:01:18Z)
Cross-modal Active Complementary Learning with Self-refining Correspondence [54.61307946222386]
We propose a Cross-modal Robust Complementary Learning framework (CRCL) to improve the robustness of existing methods. ACL exploits active and complementary learning losses to reduce the risk of providing erroneous supervision. SCC utilizes multiple self-refining processes with momentum correction to enlarge the receptive field for correcting correspondences.
arXiv Detail & Related papers (2023-10-26T15:15:11Z)
Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation [34.97083511196799]
Semi-supervised semantic segmentation (SSS) has recently gained increasing research interest. Current methods often suffer from the confirmation bias from the pseudo-labelling process. We propose a new conflict-based cross-view consistency (CCVC) method based on a two-branch co-training framework.
arXiv Detail & Related papers (2023-03-02T14:02:16Z)
Encouraging Disentangled and Convex Representation with Controllable Interpolation Regularization [15.725515910594725]
We focus on controllable disentangled representation learning (C-Dis-RL) We propose a simple yet efficient method: Controllable Interpolation Regularization (CIR)
arXiv Detail & Related papers (2021-12-06T16:52:07Z)
Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size. Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.