Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning
- URL: http://arxiv.org/abs/2503.05188v1
- Date: Fri, 07 Mar 2025 07:20:24 GMT
- Title: Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning
- Authors: Jiachun Li, Pengfei Cao, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, Jun Zhao,
- Abstract summary: Chain-of-thought (CoT) prompting demonstrates varying performance under different reasoning tasks.<n>Previous work attempts to evaluate it but falls short in providing an in-depth analysis of patterns that influence the CoT.<n>We identify key factors that influence CoT effectiveness on performance improvement, including problem difficulty, information gain, and information flow.
- Score: 17.6082037230676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Chain-of-thought (CoT) prompting demonstrates varying performance under different reasoning tasks. Previous work attempts to evaluate it but falls short in providing an in-depth analysis of patterns that influence the CoT. In this paper, we study the CoT performance from the perspective of effectiveness and faithfulness. For the former, we identify key factors that influence CoT effectiveness on performance improvement, including problem difficulty, information gain, and information flow. For the latter, we interpret the unfaithful CoT issue by conducting a joint analysis of the information interaction among the question, CoT, and answer. The result demonstrates that, when the LLM predicts answers, it can recall correct information missing in the CoT from the question, leading to the problem. Finally, we propose a novel algorithm to mitigate this issue, in which we recall extra information from the question to enhance the CoT generation and evaluate CoTs based on their information gain. Extensive experiments demonstrate that our approach enhances both the faithfulness and effectiveness of CoT.
Related papers
- Exploring Knowledge Boundaries in Large Language Models for Retrieval Judgment [56.87031484108484]
Large Language Models (LLMs) are increasingly recognized for their practical applications.
Retrieval-Augmented Generation (RAG) tackles this challenge and has shown a significant impact on LLMs.
By minimizing retrieval requests that yield neutral or harmful results, we can effectively reduce both time and computational costs.
arXiv Detail & Related papers (2024-11-09T15:12:28Z) - Towards Better Chain-of-Thought: A Reflection on Effectiveness and Faithfulness [17.6082037230676]
Chain-of-thought (CoT) prompting demonstrates varying performance under different reasoning tasks.<n>Previous work attempts to evaluate it but falls short in providing an in-depth analysis of patterns that influence the CoT.<n>We identify key factors that influence CoT effectiveness on performance improvement, including problem difficulty, information gain, and information flow.
arXiv Detail & Related papers (2024-05-29T09:17:46Z) - ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs)
Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts.
We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z) - ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis [20.24915029448926]
Large language models (LLMs) have achieved commendable accomplishments in various natural language processing tasks.
These challenges arise from the presence of implicit relationships that demand multi-step reasoning.
We propose a novel approach ERA-CoT, which aids LLMs in understanding context by capturing relationships between entities.
arXiv Detail & Related papers (2024-03-11T17:18:53Z) - Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning [21.951313919964484]
Large language models exhibit high-level commonsense reasoning abilities.
CoT-like methods lead to a considerable number of originally correct answers turning wrong.
We use attribution tracing and causal tracing methods to probe the internal working mechanism of the model.
arXiv Detail & Related papers (2024-02-28T14:09:02Z) - Disentangled Representation Learning with Transmitted Information Bottleneck [57.22757813140418]
We present textbfDisTIB (textbfTransmitted textbfInformation textbfBottleneck for textbfDisd representation learning), a novel objective that navigates the balance between information compression and preservation.
arXiv Detail & Related papers (2023-11-03T03:18:40Z) - Towards Better Chain-of-Thought Prompting Strategies: A Survey [60.75420407216108]
Chain-of-Thought (CoT) shows its impressive strength when used as a prompting strategy for large language models (LLM)
Recent years, the prominent effect of CoT prompting has attracted emerging research.
This survey could provide an overall reference on related research.
arXiv Detail & Related papers (2023-10-08T01:16:55Z) - Stress Testing Chain-of-Thought Prompting for Large Language Models [0.16317061277456998]
This report examines the effectiveness of Chain-of-Thought (CoT) prompting in improving the multi-step reasoning abilities of large language models (LLMs)
We analyze the impact of three types of CoT prompt perturbations, namely CoT order, CoT values, and CoT operators on the performance of GPT-3 on various tasks.
arXiv Detail & Related papers (2023-09-28T17:21:33Z) - Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for
Knowledge-intensive Question Answering [17.672572064705445]
Large language models (LLMs) equipped with Chain-of-Thought (CoT) have shown impressive reasoning ability in various downstream tasks.
We propose a framework called Knowledge-Driven Chain-of-Thought (KD-CoT) to verify and modify reasoning traces in CoT via interaction with external knowledge.
arXiv Detail & Related papers (2023-08-25T09:23:55Z) - Towards Understanding Chain-of-Thought Prompting: An Empirical Study of
What Matters [82.84696222087396]
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs)
We show that CoT reasoning is possible even with invalid demonstrations.
arXiv Detail & Related papers (2022-12-20T05:20:54Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.