Text2Grad: Reinforcement Learning from Natural Language Feedback
- URL: http://arxiv.org/abs/2505.22338v1
- Date: Wed, 28 May 2025 13:23:49 GMT
- Title: Text2Grad: Reinforcement Learning from Natural Language Feedback
- Authors: Hanyang Wang, Lu Wang, Chaoyun Zhang, Tianjun Mao, Si Qin, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang,
- Abstract summary: We introduce Text2Grad, a-grained reinforcement paradigm that turns free-form textual feedback into span-level gradients.<n>Our results demonstrate that natural-language feedback, when converted to gradients, is a powerful signal for finegrained policy optimization.
- Score: 32.59003667154527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional RLHF optimizes language models with coarse, scalar rewards that mask the fine-grained reasons behind success or failure, leading to slow and opaque learning. Recent work augments RL with textual critiques through prompting or reflection, improving interpretability but leaving model parameters untouched. We introduce Text2Grad, a reinforcement-learning paradigm that turns free-form textual feedback into span-level gradients. Given human (or programmatic) critiques, Text2Grad aligns each feedback phrase with the relevant token spans, converts these alignments into differentiable reward signals, and performs gradient updates that directly refine the offending portions of the model's policy. This yields precise, feedback-conditioned adjustments instead of global nudges. Text2Grad is realized through three components: (1) a high-quality feedback-annotation pipeline that pairs critiques with token spans; (2) a fine-grained reward model that predicts span-level reward on answer while generating explanatory critiques; and (3) a span-level policy optimizer that back-propagates natural-language gradients. Across summarization, code generation, and question answering, Text2Grad consistently surpasses scalar-reward RL and prompt-only baselines, providing both higher task metrics and richer interpretability. Our results demonstrate that natural-language feedback, when converted to gradients, is a powerful signal for fine-grained policy optimization. The code for our method is available at https://github.com/microsoft/Text2Grad
Related papers
- Towards Bridging Review Sparsity in Recommendation with Textual Edge Graph Representation [28.893058826607735]
We propose a unified framework that imputes missing reviews by jointly modeling semantic and structural signals.<n>Experiments on the Amazon and Goodreads datasets show that TWISTER consistently outperforms traditional numeric, graph-based, and LLM baselines.<n>In summary, TWISTER generates reviews that are more helpful, authentic, and specific, while smoothing structural signals for improved recommendations.
arXiv Detail & Related papers (2025-08-02T00:53:40Z) - Can Gradient Descent Simulate Prompting? [56.60154660021178]
gradient updates the effects of conditioning on new information.<n> gradient descent training recovers some (and occasionally all) of prompted model performance.<n>Results suggest new avenues for long-context modeling.
arXiv Detail & Related papers (2025-06-26T04:06:20Z) - Compile Scene Graphs with Reinforcement Learning [69.36723767339001]
Next-token prediction is the fundamental principle for training large language models (LLMs)<n>We introduce R1-SGG, a multimodal LLM (M-LLM) initially trained via supervised fine-tuning (SFT) on the scene graph dataset.<n>We design a set of graph-centric rewards, including three recall-based variants -- Hard Recall, Hard Recall+Relax, and Soft Recall.
arXiv Detail & Related papers (2025-04-18T10:46:22Z) - ListConRanker: A Contrastive Text Reranker with Listwise Encoding [27.017035527335402]
We propose a novel Listwise-encoded Contrastive text reRanker (ListConRanker)<n>It can help the passage to be compared with other passages during the encoding process.<n>It achieves state-of-the-art performance on the reranking benchmark of Chinese Massive Text Embedding Benchmark.
arXiv Detail & Related papers (2025-01-13T07:51:46Z) - LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions [17.88069510398486]
gradient-based explanations struggle with Transformers, and how can we improve them?
We identify flow imbalances in Transformers that violate FullGrad-completeness, a critical property for attribution gradient that CNNs naturally possess.
We introduce LibraGrad -- a theoretically grounded post-hoc approach that corrects gradient imbalances through pruning and scaling of backward paths.
arXiv Detail & Related papers (2024-11-24T15:02:52Z) - Fine-Grained Human Feedback Gives Better Rewards for Language Model
Training [108.25635150124539]
Language models (LMs) often exhibit undesirable text generation behaviors, including generating false, toxic, or irrelevant outputs.
We introduce Fine-Grained RLHF, a framework that enables training and learning from reward functions that are fine-grained in two respects.
arXiv Detail & Related papers (2023-06-02T17:11:37Z) - DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion
Models [97.31200133440308]
We propose using online reinforcement learning to fine-tune text-to-image models.
We focus on diffusion models, defining the fine-tuning task as an RL problem.
Our approach, coined DPOK, integrates policy optimization with KL regularization.
arXiv Detail & Related papers (2023-05-25T17:35:38Z) - TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision.
We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder.
We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z) - Aligning Text-to-Image Models using Human Feedback [104.76638092169604]
Current text-to-image models often generate images that are inadequately aligned with text prompts.
We propose a fine-tuning method for aligning such models using human feedback.
Our results demonstrate the potential for learning from human feedback to significantly improve text-to-image models.
arXiv Detail & Related papers (2023-02-23T17:34:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.