Related papers: Aligning Deep Implicit Preferences by Learning to Reason Defensively

Aligning Deep Implicit Preferences by Learning to Reason Defensively

URL: http://arxiv.org/abs/2510.11194v1
Date: Mon, 13 Oct 2025 09:26:47 GMT
Title: Aligning Deep Implicit Preferences by Learning to Reason Defensively
Authors: Peiming Li, Zhiyuan Hu, Yang Tang, Shiyu Li, Xi Chen,
Abstract summary: We propose Critique-Driven Reasoning Alignment (CDRA) to bridge the preference inference gap.<n>CDRA reframes alignment from a scalar reward-matching task into a structured reasoning process.<n> Experiments demonstrate that CDRA excels at discovering and aligning with users' true preferences while executing robust reasoning.
Score: 22.548051297731416
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized alignment is crucial for enabling Large Language Models (LLMs) to engage effectively in user-centric interactions. However, current methods face a dual challenge: they fail to infer users' deep implicit preferences (including unstated goals, semantic context and risk tolerances), and they lack the defensive reasoning required to navigate real-world ambiguity. This cognitive gap leads to responses that are superficial, brittle and short-sighted. To address this, we propose Critique-Driven Reasoning Alignment (CDRA), which reframes alignment from a scalar reward-matching task into a structured reasoning process. First, to bridge the preference inference gap, we introduce the DeepPref benchmark. This dataset, comprising 3000 preference-query pairs across 20 topics, is curated by simulating a multi-faceted cognitive council that produces critique-annotated reasoning chains to deconstruct query semantics and reveal latent risks. Second, to instill defensive reasoning, we introduce the Personalized Generative Process Reward Model (Pers-GenPRM), which frames reward modeling as a personalized reasoning task. It generates a critique chain to evaluate a response's alignment with user preferences before outputting a final score based on this rationale. Ultimately, this interpretable, structured reward signal guides policy model through Critique-Driven Policy Alignment, a process-level online reinforcement learning algorithm integrating both numerical and natural language feedback. Experiments demonstrate that CDRA excels at discovering and aligning with users' true preferences while executing robust reasoning. Our code and dataset are available at https://github.com/Zephyrian-Hugh/Deep-pref.

Related papers

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation [2.379349092029744]
LLM-based explainable recommenders can produce explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences.<n>We formalize this failure mode and propose PURE, a preference-aware reasoning framework following a select-then-generate paradigm.<n>PURE selects a compact set of multi-hop item-centric reasoning paths that are both factually grounded and aligned with user preference structure, guided by user intent, specificity, and diversity to suppress generic, weakly personalized evidence.
arXiv Detail & Related papers (2026-03-03T15:24:51Z)
LaSER: Internalizing Explicit Reasoning into Latent Space for Dense Retrieval [74.72139580745511]
LaSER is a novel self-distillation framework that internalizes explicit reasoning into the latent space of retrievers.<n>Our method successfully combines the reasoning depth of explicit CoT pipelines with the inference efficiency of standard dense retrievers.
arXiv Detail & Related papers (2026-03-02T04:11:18Z)
CORE: Context-Robust Remasking for Diffusion Language Models [51.59514489363897]
We propose Context-Robust Remasking (CORE), a training-free framework for inference-time revision.<n>Rather than trusting static token probabilities, CORE identifies context-brittle tokens by probing their sensitivity to targeted masked-context perturbations.<n>On LLaDA-8B-Base, CORE delivers consistent improvements across reasoning and code benchmarks, outperforming compute-matched baselines and improving MBPP by up to 9.2 percentage points.
arXiv Detail & Related papers (2026-02-04T00:12:30Z)
Reasoning While Asking: Transforming Reasoning Large Language Models from Passive Solvers to Proactive Inquirers [41.58256327940237]
Proactive Interactive Reasoning transforms Large Language Models into proactive inquirers.<n>PIR targets premise- and intent-level uncertainty through direct interaction with the user.<n>Experiments on mathematical reasoning, code generation, and document editing demonstrate that PIR consistently outperforms strong baselines.
arXiv Detail & Related papers (2026-01-29T18:56:12Z)
How Does Prefix Matter in Reasoning Model Tuning? [57.69882799751655]
We fine-tune three R1 series models across three core model capabilities: reasoning (mathematics), coding, safety, and factuality.<n>Results show that prefix-conditioned SFT improves both safety and reasoning performance, yielding up to +6% higher Safe@1 accuracy.
arXiv Detail & Related papers (2026-01-04T18:04:23Z)
In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback [38.915062716409686]
InTRO is a new framework that enables both token-level exploration and self-feedback for accurate and concise reasoning.<n>InTRO consistently outperforms other baselines, raising solution accuracy by up to 20% relative to the base model.<n>Its chains of thought are notably more concise, exhibiting reduced verbosity.
arXiv Detail & Related papers (2025-11-13T01:47:06Z)
Robust Preference Alignment via Directional Neighborhood Consensus [13.313830197011983]
We introduce Robust Preference Selection (RPS), a post-hoc, training-free method by leveraging directional neighborhood consensus.<n>RPS samples multiple responses from a local neighborhood of related preferences to create a superior candidate pool.<n>Our work presents a practical, theoretically-grounded solution for enhancing the reliability of preference-aligned models.
arXiv Detail & Related papers (2025-10-23T12:39:20Z)
A-IPO: Adaptive Intent-driven Preference Optimization [14.221471110333828]
We introduce underlinetextbfAdaptive textbfunderlineIntent-driven textbfunderlinePreference textbfunderlineOptimization (textbfA-IPO)<n>A-IPO introduces an intention module that infers the latent intent behind each user prompt and explicitly incorporates this inferred intent into the reward function.
arXiv Detail & Related papers (2025-10-11T07:29:11Z)
Personalized Reasoning: Just-In-Time Personalization and Why LLMs Fail At It [81.50711040539566]
Current large language model (LLM) development treats task-solving and preference alignment as separate challenges.<n>We introduce PREFDISCO, an evaluation methodology that transforms static benchmarks into interactive personalization tasks.<n>Our framework creates scenarios where identical questions require different reasoning chains depending on user context.
arXiv Detail & Related papers (2025-09-30T18:55:28Z)
FIRE: Faithful Interpretable Recommendation Explanations [2.6499018693213316]
Natural language explanations in recommender systems are often framed as a review generation task.<n>Fire is a lightweight and interpretable framework that combines SHAP-based feature attribution with structured, prompt-driven language generation.<n>Our results demonstrate that FIRE not only achieves competitive recommendation accuracy but also significantly improves explanation quality along critical dimensions such as alignment, structure, and faithfulness.
arXiv Detail & Related papers (2025-08-07T10:11:02Z)
Reason-to-Recommend: Using Interaction-of-Thought Reasoning to Enhance LLM Recommendation [9.282278040339138]
$textbfR2Rec$ is a reasoning-enhanced recommendation framework.<n>It samples interaction chains from the user-item graph and converts them into structured interaction-of-thoughts.
arXiv Detail & Related papers (2025-06-05T14:16:44Z)
Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization [58.390885294401066]
Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs)<n>RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions.<n>We propose AlignRAG, a novel iterative framework grounded in Critique-Driven Alignment (CDA)<n>We introduce AlignRAG-auto, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations.
arXiv Detail & Related papers (2025-04-21T04:56:47Z)
REINFORCE Adversarial Attacks on Large Language Models: An Adaptive, Distributional, and Semantic Objective [57.57786477441956]
We propose an adaptive and semantic optimization problem over the population of responses.<n>Our objective doubles the attack success rate (ASR) on Llama3 and increases the ASR from 2% to 50% with circuit breaker defense.
arXiv Detail & Related papers (2025-02-24T15:34:48Z)
Deliberative Alignment: Reasoning Enables Safer Language Models [64.60765108418062]
We introduce Deliberative Alignment, a new paradigm that teaches the model safety specifications and trains it to explicitly recall and accurately reason over the specifications before answering.<n>We used this approach to align OpenAI's o-series models, and achieved highly precise adherence to OpenAI's safety policies, without requiring human-written chain-of-thoughts or answers.
arXiv Detail & Related papers (2024-12-20T21:00:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.