Related papers: CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning

CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning

URL: http://arxiv.org/abs/2510.13166v2
Date: Thu, 16 Oct 2025 02:13:16 GMT
Title: CoT-Evo: Evolutionary Distillation of Chain-of-Thought for Scientific Reasoning
Authors: Kehua Feng, Keyan Ding, Zhihui Zhu, Lei Liang, Qiang Zhang, Huajun Chen,
Abstract summary: Chain-of-thought (CoT) distillation from advanced large language models (LLMs) has proven effective in general reasoning tasks.<n>But it struggles in scientific domains where even advanced models often produce incorrect or superficial reasoning.<n>We propose CoT-Evo, an evolutionary CoT distillation framework to overcome this problem.
Score: 63.44477226386808
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While chain-of-thought (CoT) distillation from advanced large language models (LLMs) has proven effective in general reasoning tasks, it struggles in scientific domains where even advanced models often produce incorrect or superficial reasoning due to high complexity and specialized knowledge requirements. Directly distilling from such flawed outputs results in low-quality training data and limits the performance of smaller student models. To overcome this, we propose CoT-Evo, an evolutionary CoT distillation framework. It begins by constructing a diverse pool of reasoning trajectories from multiple LLM thinkers, enriches them with automatically retrieved domain knowledge, and iteratively refines the trajectories using novelty-driven selection, reflective recombination and mutation. The refinement is guided by a fitness function that evaluates answer correctness, coherence, and effective knowledge utilization. This results in a high-quality CoT dataset tailored for scientific reasoning. We employ this evolved dataset to fine-tune a compact model, which achieves state-of-the-art performance on scientific reasoning benchmarks. Our work establishes a scalable approach to synthesizing high-fidelity scientific reasoning data from diverse and fallible LLMs.

Related papers

LLM-Based Scientific Equation Discovery via Physics-Informed Token-Regularized Policy Optimization [32.24464649397858]
PiT-PO is a unified framework that evolves the Large Language Models into an adaptive generator via reinforcement learning.<n>Central to PiT-PO is a dual-constraint mechanism that rigorously enforces hierarchical physical validity while simultaneously applying fine-grained, token-level penalties to suppress redundant structures.<n> Empirically, PiT-PO achieves state-of-the-art performance on standard benchmarks and successfully discovers novel turbulence models for challenging fluid dynamics problems.
arXiv Detail & Related papers (2026-02-11T07:02:23Z)
Round-trip Reinforcement Learning: Self-Consistent Training for Better Chemical LLMs [51.29260537017623]
Large Language Models (LLMs) are emerging as versatile foundation models for computational chemistry.<n>These models often lack round-trip consistency.<n>We introduce Round-Trip Reinforcement Learning (RTRL), a novel framework that trains a model to improve its consistency.
arXiv Detail & Related papers (2025-10-01T23:58:58Z)
Atomic Reasoning for Scientific Table Claim Verification [83.14588611859826]
Non-experts are susceptible to misleading claims based on scientific tables due to their high information density and perceived credibility.<n>Existing table claim verification models, including state-of-the-art large language models (LLMs), often struggle with precise fine-grained reasoning.<n>Inspired by Cognitive Load Theory, we propose that enhancing a model's ability to interpret table-based claims involves reducing cognitive load.
arXiv Detail & Related papers (2025-06-08T02:46:22Z)
Enhancing Long-Chain Reasoning Distillation through Error-Aware Self-Reflection [64.73809794561305]
errOr-aware self-ReflectION (ORION) is a framework that refines teacher CoTs through an Error-Aware Reflection process.<n> Experiments on multiple mathematical reasoning benchmarks demonstrate that ORION consistently improves performance by more than 2% over all baselines.
arXiv Detail & Related papers (2025-05-28T08:57:03Z)
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models [17.673293240849787]
We introduce SPHERE, a self-evolving data generation pipeline that enhances reasoning in small language models (SLMs)<n> SPHERE operates in three stages: (i) Self-Generation, where the model autonomously constructs problem-solving steps; (ii) Self-Correction, enabling it to identify and rectify errors; and (iii) Diversity Induction, improving robustness through multiple valid reasoning trajectories.<n>We show that SPHERE-trained models achieve significant gains over their base versions and match/surpass GPT-4o on certain benchmarks.
arXiv Detail & Related papers (2025-03-04T14:43:25Z)
ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting [124.69672273754144]
Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs) Existing CoT approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts. We introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
arXiv Detail & Related papers (2024-03-21T11:34:26Z)
Towards Causal Foundation Model: on Duality between Causal Inference and Attention [18.046388712804042]
We take a first step towards building causally-aware foundation models for treatment effect estimations. We propose a novel, theoretically justified method called Causal Inference with Attention (CInA)
arXiv Detail & Related papers (2023-10-01T22:28:34Z)
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Mixed Large Language Model Signals for Science Question Answering [59.63860993280275]
Large Language Models (LLMs) have demonstrated exceptional performance in various Natural Language Processing (NLP) tasks. We propose a novel method termed T-SciQ that aims at teaching science question answering with LLM signals. Our approach achieves a new state-of-the-art performance on the ScienceQA benchmark, with an accuracy of 96.18%.
arXiv Detail & Related papers (2023-05-05T11:56:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.