Related papers: Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions

Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions

URL: http://arxiv.org/abs/2502.18435v2
Date: Thu, 20 Mar 2025 03:25:21 GMT
Title: Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions
Authors: Yizhe Zhang, Richard Bai, Zijin Gu, Ruixiang Zhang, Jiatao Gu, Emmanuel Abbe, Samy Bengio, Navdeep Jaitly,
Abstract summary: Language models usually use left-to-right (L2R) autoregressive factorization.<n>We investigate whether alternative factorizations of the text distribution could be beneficial in some tasks.
Score: 51.61404787000037
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models usually use left-to-right (L2R) autoregressive factorization. However, L2R factorization may not always be the best inductive bias. Therefore, we investigate whether alternative factorizations of the text distribution could be beneficial in some tasks. We investigate right-to-left (R2L) training as a compelling alternative, focusing on multiple-choice questions (MCQs) as a test bed for knowledge extraction and reasoning. Through extensive experiments across various model sizes (2B-8B parameters) and training datasets, we find that R2L models can significantly outperform L2R models on several MCQ benchmarks, including logical reasoning, commonsense understanding, and truthfulness assessment tasks. Our analysis reveals that this performance difference may be fundamentally linked to multiple factors including calibration, computability and directional conditional entropy. We ablate the impact of these factors through controlled simulation studies using arithmetic tasks, where the impacting factors can be better disentangled. Our work demonstrates that exploring alternative factorizations of the text distribution can lead to improvements in LLM capabilities and provides theoretical insights into optimal factorization towards approximating human language distribution, and when each reasoning order might be more advantageous.

Related papers

Systematic Bias in Large Language Models: Discrepant Response Patterns in Binary vs. Continuous Judgment Tasks [13.704342633541454]
Large Language Models (LLMs) are increasingly used in tasks such as psychological text analysis and decision-making in automated systems. This study examines how different response format: binary versus continuous, may systematically influence LLMs' judgments.
arXiv Detail & Related papers (2025-04-28T03:20:55Z)
Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition [11.422434149376478]
Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks. Recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features.
arXiv Detail & Related papers (2025-04-04T20:57:36Z)
A Survey of Scaling in Large Language Model Reasoning [62.92861523305361]
We provide a comprehensive examination of scaling in large Language models (LLMs) reasoning. We analyze scaling in reasoning steps that improves multi-step inference and logical consistency. We discuss scaling in training-enabled reasoning, focusing on optimization through iterative model improvement.
arXiv Detail & Related papers (2025-04-02T23:51:27Z)
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 [53.894789613838654]
We introduce SEED-Bench-R1, a benchmark designed to evaluate post-training methods for MLLMs in video understanding. It includes intricate real-world videos and complex everyday planning tasks in the format of multiple-choice questions. Using Qwen2-VL-Instruct-7B as a base model, we compare RL with supervised fine-tuning (SFT) Our detailed analysis reveals that RL enhances visual perception but often produces less coherent reasoning chains.
arXiv Detail & Related papers (2025-03-31T17:55:23Z)
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning [113.49074603075032]
Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks.<n>We explore whether scaling with longer CoTs can indeed impair the reasoning performance of Large Language Models (LLMs) in certain domains.
arXiv Detail & Related papers (2025-02-25T10:48:05Z)
Reasoning or a Semblance of it? A Diagnostic Study of Transitive Reasoning in LLMs [11.805264893752154]
We evaluate the reasoning capabilities of two large language models, LLaMA 2 and Flan-T5, by manipulating facts within two compositional datasets: QASC and Bamboogle. Our findings reveal that while both models leverage (a), Flan-T5 shows more resilience to experiments, having less variance than LLaMA 2. This suggests that models may develop an understanding of transitivity through fine-tuning on knowingly relevant datasets.
arXiv Detail & Related papers (2024-10-26T15:09:07Z)
Uncovering Factor Level Preferences to Improve Human-Model Alignment [58.50191593880829]
We introduce PROFILE, a framework that uncovers and quantifies the influence of specific factors driving preferences. ProFILE's factor level analysis explains the 'why' behind human-model alignment and misalignment. We demonstrate how leveraging factor level insights, including addressing misaligned factors, can improve alignment with human preferences.
arXiv Detail & Related papers (2024-10-09T15:02:34Z)
Thought-Path Contrastive Learning via Premise-Oriented Data Augmentation for Logical Reading Comprehension [9.67774998354062]
Previous research has primarily focused on enhancing logical reasoning capabilities through Chain-of-Thought (CoT) or data augmentation.<n>We propose a Premise-Oriented Data Augmentation (PODA) framework to generate CoT rationales including analyses for both correct and incorrect options.<n>We also introduce a novel thought-path contrastive learning method that compares reasoning paths between the original and counterfactual samples.
arXiv Detail & Related papers (2024-09-22T15:44:43Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
How Likely Do LLMs with CoT Mimic Human Reasoning? [31.86489714330338]
Chain-of-thought emerges as a promising technique for eliciting reasoning capabilities from Large Language Models (LLMs)<n>We use causal analysis to understand the relationships between the problem instruction, reasoning, and the answer in LLMs.
arXiv Detail & Related papers (2024-02-25T10:13:04Z)
IRRGN: An Implicit Relational Reasoning Graph Network for Multi-turn Response Selection [4.471148909362883]
Implicit Reasoning to Graph Network aims to implicitly extract between utterances, as well as utterances and options. Model surpasses human performance for the first time on the MuTual dataset.
arXiv Detail & Related papers (2022-12-01T13:17:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.