Related papers: HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH

HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH

URL: http://arxiv.org/abs/2601.20255v1
Date: Wed, 28 Jan 2026 05:03:24 GMT
Title: HE-SNR: Uncovering Latent Logic via Entropy for Guiding Mid-Training on SWE-BENCH
Authors: Yueyang Wang, Jiawei Fu, Baolong Bi, Xili Wang, Xiaoqing Liu,
Abstract summary: SWE-bench has emerged as the premier benchmark for evaluating Large Language Models on complex software engineering tasks.<n>Standard metrics such as Perplexity (PPL) are compromised by the "Long-Context Tax" and exhibit weak correlation with downstream SWE performance.<n>We propose the Entropy Compression Hypothesis, redefining intelligence not by scalar Top-1 compression, but by the capacity to structure uncertainty into Entropy-Compressed States.
Score: 11.643006508214887
License: http://creativecommons.org/licenses/by/4.0/
Abstract: SWE-bench has emerged as the premier benchmark for evaluating Large Language Models on complex software engineering tasks. While these capabilities are fundamentally acquired during the mid-training phase and subsequently elicited during Supervised Fine-Tuning (SFT), there remains a critical deficit in metrics capable of guiding mid-training effectively. Standard metrics such as Perplexity (PPL) are compromised by the "Long-Context Tax" and exhibit weak correlation with downstream SWE performance. In this paper, we bridge this gap by first introducing a rigorous data filtering strategy. Crucially, we propose the Entropy Compression Hypothesis, redefining intelligence not by scalar Top-1 compression, but by the capacity to structure uncertainty into Entropy-Compressed States of low orders ("reasonable hesitation"). Grounded in this fine-grained entropy analysis, we formulate a novel metric, HE-SNR (High-Entropy Signal-to-Noise Ratio). Validated on industrial-scale Mixture-of-Experts (MoE) models across varying context windows (32K/128K), our approach demonstrates superior robustness and predictive power. This work provides both the theoretical foundation and practical tools for optimizing the latent potential of LLMs in complex engineering domains.

Related papers

Efficient and Stable Reinforcement Learning for Diffusion Language Models [59.75789436018925]
Reinforcement Learning (RL) is crucial for unlocking the complex reasoning capabilities of Diffusion-based Large Language Models (dLLMs)<n>Applying to dLLMs faces unique challenges in efficiency and stability.<n>We propose Spatio-Temporal Pruning (STP), a framework designed to simultaneously improve the efficiency and stability of RL for dLLMs.
arXiv Detail & Related papers (2026-02-09T17:04:23Z)
Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression [55.63153956934198]
Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs)<n>Existing CoT compression methods often suffer from a critical loss of logical fidelity at high compression ratios.<n>We propose a novel EXTreme-RAtio Chain-of-Thought Compression framework, termed Extra-CoT, which aggressively reduces the token budget while preserving answer accuracy.
arXiv Detail & Related papers (2026-02-09T06:57:15Z)
How to Set the Learning Rate for Large-Scale Pre-training? [73.03133634525635]
We formalize this investigation into two distinct research paradigms: Fitting and Transfer.<n>Within the Fitting Paradigm, we introduce a Scaling Law for search factor, effectively reducing the search complexity from O(n3) to O(n*C_D*C_) via predictive modeling.<n>We extend the principles of $$Transfer to the Mixture of Experts (MoE) architecture, broadening its applicability to encompass model depth, weight decay, and token horizons.
arXiv Detail & Related papers (2026-01-08T15:55:13Z)
CoT-Saliency: Unified Chain-of-Thought Reasoning for Heterogeneous Saliency Tasks [96.64597365827046]
We present the first unified framework that jointly handles three operationally heterogeneous saliency tasks.<n>We introduce a Chain-of-Thought (CoT) reasoning process in a Vision-Language Model (VLM) to bridge task heterogeneity.<n>We show our model matches or outperforms specialized SOTA methods and strong closed-source VLMs across all tasks.
arXiv Detail & Related papers (2025-11-01T04:37:01Z)
A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data [3.121656940390038]
Large Language Models (LLMs) offer a flexible means to generate synthetic data.<n>Existing approaches often fail to preserve key causal parameters such as the average treatment effect (ATE)
arXiv Detail & Related papers (2025-10-31T23:34:44Z)
MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z)
PT$^2$-LLM: Post-Training Ternarization for Large Language Models [52.4629647715623]
Large Language Models (LLMs) have shown impressive capabilities across diverse tasks, but their large memory and compute demands hinder deployment.<n>We propose PT$2$-LLM, a post-training ternarization framework tailored for LLMs.<n>At its core is an Asymmetric Ternary Quantizer equipped with a two-stage refinement pipeline.
arXiv Detail & Related papers (2025-09-27T03:01:48Z)
CARFT: Boosting LLM Reasoning via Contrastive Learning with Annotated Chain-of-Thought-based Reinforced Fine-Tuning [25.142128256576985]
We propose a Contrastive learning with annotated CoT-based Reinforced Fine-Tuning approach, i.e., TheName, to enhance the reasoning performance of Large Language Models.<n>Our approach not only fully exploits the available annotated CoT but also stabilizes the fine-tuning procedure by incorporating an additional unsupervised learning signal.
arXiv Detail & Related papers (2025-08-21T00:20:47Z)
Compressing Chain-of-Thought in LLMs via Step Entropy [12.576398947428988]
Large Language Models (LLMs) using Chain-of-Thought (CoT) prompting excel at complex reasoning but generate thought processes with considerable redundancy, leading to increased inference costs and reduced efficiency.<n>We introduce a novel CoT compression framework based on step entropy, a metric that quantifies the informational contribution of individual reasoning steps to identify redundancy.
arXiv Detail & Related papers (2025-08-05T11:48:18Z)
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections [65.36449542323277]
We present a unified theoretical framework bridgingSupervised Fine-Tuning (SFT) and preference learning in Large Language Model (LLM) post-training.<n>We propose a simple yet effective learning rate reduction approach that yields significant performance improvements.
arXiv Detail & Related papers (2025-06-15T05:42:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.