Related papers: NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions

URL: http://arxiv.org/abs/2502.13124v2
Date: Fri, 21 Feb 2025 16:02:42 GMT
Title: NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions
Authors: Weizhe Yuan, Jane Yu, Song Jiang, Karthik Padthe, Yang Li, Dong Wang, Ilia Kulikov, Kyunghyun Cho, Yuandong Tian, Jason E Weston, Xian Li,
Abstract summary: We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains.<n>We show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model.<n>It is also effective for unsupervised self-training using external reward models or self-rewarding.
Score: 86.15997774820934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a scalable approach for generating diverse and challenging reasoning questions, accompanied by reference answers. We present NaturalReasoning, a comprehensive dataset comprising 2.8 million questions that span multiple domains, including STEM fields (e.g., Physics, Computer Science), Economics, Social Sciences, and more. We demonstrate the utility of the questions in NaturalReasoning through knowledge distillation experiments which show that NaturalReasoning can effectively elicit and transfer reasoning capabilities from a strong teacher model. Furthermore, we demonstrate that NaturalReasoning is also effective for unsupervised self-training using external reward models or self-rewarding.

Related papers

NaturalThoughts: Selecting and Distilling Reasoning Traces for General Reasoning Tasks [65.70224757972068]
We select reasoning traces from a strong teacher model based on a large pool of questions from NaturalReasoning.<n>We find that simply scaling up data size with random sampling is a strong baseline with steady performance gains.<n>We find that selecting difficult examples that require more diverse reasoning strategies is more sample-efficient to transfer the teacher model's reasoning skills.
arXiv Detail & Related papers (2025-07-02T17:30:24Z)
SR-FoT: A Syllogistic-Reasoning Framework of Thought for Large Language Models Tackling Knowledge-based Reasoning Tasks [42.392103712958445]
Large Language Models (LLMs) might not follow the correct reasoning paths.<n>We propose a multi-stage Syllogistic-Reasoning Framework of Thought (SR-FoT)<n>Our SR-FoT begins by interpreting the question and then uses the interpretation and the original question to propose a suitable major premise.
arXiv Detail & Related papers (2025-01-20T17:00:41Z)
Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch [54.12139707822201]
We propose ScaleQuest, a novel, scalable, and cost-effective data synthesis method.<n>By generating diverse questions from scratch, we produce a dataset of 1 million problem-solution pairs.<n>Our experiments demonstrate that models trained on our data outperform existing open-source datasets.
arXiv Detail & Related papers (2024-10-24T12:42:04Z)
Multi-Faceted Question Complexity Estimation Targeting Topic Domain-Specificity [0.0]
This paper presents a novel framework for domain-specific question difficulty estimation, leveraging a suite of NLP techniques and knowledge graph analysis. We introduce four key parameters: Topic Retrieval Cost, Topic Salience, Topic Coherence, and Topic Superficiality. A model trained on these features demonstrates the efficacy of our approach in predicting question difficulty.
arXiv Detail & Related papers (2024-08-23T05:40:35Z)
Analyzing Human Questioning Behavior and Causal Curiosity through Natural Queries [91.70689724416698]
We present NatQuest, a collection of 13,500 naturally occurring questions from three diverse sources. Our analysis reveals a significant presence of causal questions (up to 42%) within the dataset.
arXiv Detail & Related papers (2024-05-30T17:55:28Z)
Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z)
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent [50.508669199496474]
We develop a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We refine the agent through a ReST-like method that iteratively trains on previous trajectories. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model.
arXiv Detail & Related papers (2023-12-15T18:20:15Z)
MacGyver: Are Large Language Models Creative Problem Solvers? [87.70522322728581]
We explore the creative problem-solving capabilities of modern LLMs in a novel constrained setting. We create MACGYVER, an automatically generated dataset consisting of over 1,600 real-world problems. We present our collection to both LLMs and humans to compare and contrast their problem-solving abilities.
arXiv Detail & Related papers (2023-11-16T08:52:27Z)
FOLLOWUPQG: Towards Information-Seeking Follow-up Question Generation [38.78216651059955]
We introduce the task of real-world information-seeking follow-up question generation (FQG) We construct FOLLOWUPQG, a dataset of over 3K real-world (initial question, answer, follow-up question)s collected from a forum layman providing Reddit-friendly explanations for open-ended questions. In contrast to existing datasets, questions in FOLLOWUPQG use more diverse pragmatic strategies to seek information, and they also show higher-order cognitive skills.
arXiv Detail & Related papers (2023-09-10T11:58:29Z)
RECKONING: Reasoning through Dynamic Knowledge Encoding [51.076603338764706]
We show that language models can answer questions by reasoning over knowledge provided as part of the context. In these situations, the model fails to distinguish the knowledge that is necessary to answer the question. We propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters.
arXiv Detail & Related papers (2023-05-10T17:54:51Z)
Causal Deep Learning [77.49632479298745]
Causality has the potential to transform the way we solve real-world problems. But causality often requires crucial assumptions which cannot be tested in practice. We propose a new way of thinking about causality -- we call this causal deep learning.
arXiv Detail & Related papers (2023-03-03T19:19:18Z)
Towards a Holistic Understanding of Mathematical Questions with Contrastive Pre-training [65.10741459705739]
We propose a novel contrastive pre-training approach for mathematical question representations, namely QuesCo. We first design two-level question augmentations, including content-level and structure-level, which generate literally diverse question pairs with similar purposes. Then, to fully exploit hierarchical information of knowledge concepts, we propose a knowledge hierarchy-aware rank strategy.
arXiv Detail & Related papers (2023-01-18T14:23:29Z)
DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering [34.70206857546496]
Question answering models commonly have access to two sources of "knowledge" during inference time. It is unclear whether the answer stems from the given non-parametric knowledge or not. We propose a new paradigm in which QA models are trained to disentangle the two sources of knowledge.
arXiv Detail & Related papers (2022-11-10T15:34:44Z)
Enhancing Question Generation with Commonsense Knowledge [33.289599417096206]
We propose a multi-task learning framework to introduce commonsense knowledge into question generation process. Experimental results on SQuAD show that our proposed methods are able to noticeably improve the QG performance on both automatic and human evaluation metrics.
arXiv Detail & Related papers (2021-06-19T08:58:13Z)
Social Commonsense Reasoning with Multi-Head Knowledge Attention [24.70946979449572]
Social Commonsense Reasoning requires understanding of text, knowledge about social events and their pragmatic implications, as well as commonsense reasoning skills. We propose a novel multi-head knowledge attention model that encodes semi-structured commonsense inference rules and learns to incorporate them in a transformer-based reasoning cell.
arXiv Detail & Related papers (2020-10-12T10:24:40Z)
Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context. We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator. We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.