Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
- URL: http://arxiv.org/abs/2510.08549v2
- Date: Fri, 10 Oct 2025 04:41:32 GMT
- Title: Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
- Authors: Zilin Kang, Chonghua Liao, Tingqiang Xu, Huazhe Xu,
- Abstract summary: We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models.<n>Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.
- Score: 32.9206867882979
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose ERA, a new paradigm that constrains the sampling entropy above given thresholds by applying specially designed activations to the outputs of models. Our approach demonstrates broad effectiveness across different domains: 1) for large language models(LLMs), boosting the AIME 2025 score for Qwen2.5-Math-7B by 37.4%; 2) for continuous control reinforcement learning agents, improving performance by more than 30% over strong baselines such as SAC on the challenging HumanoidBench; 3) for image classification, enhancing ImageNet top-1 accuracy by 0.69% for ResNet-50. These gains are achieved with a computational overhead of less than 7%. Our work validates output activation as a powerful tool for entropy control, opening a new direction for designing simpler and more robust algorithms.
Related papers
- E$^3$-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models [24.195465096877196]
layer pruning is a hardware-friendly approach for model compression.<n> introduces two key innovations: a differentiable mask optimization method and an entropy-aware adaptive knowledge distillation strategy.<n> achieves 96% accuracy, a mere 0.8% drop from the original model.
arXiv Detail & Related papers (2025-11-21T12:32:01Z) - DEPO: Dual-Efficiency Preference Optimization for LLM Agents [75.6723341304463]
We propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps.<n>Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance.
arXiv Detail & Related papers (2025-11-19T12:38:43Z) - Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z) - Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents [40.73340280747757]
The ReAct capability in large language models (LLMs) has become the foundation of modern agentic systems.<n>We introduce Pre-Act, a novel approach that enhances the agent's performance by creating a multi-step execution plan.<n>Our approach is applicable to both conversational and non-conversational agents.
arXiv Detail & Related papers (2025-05-15T05:17:47Z) - Reinforcement Learning for Reasoning in Large Language Models with One Training Example [133.018487956408]
We show that reinforcement learning with verifiable reward using one training example (1-shot RLVR) is effective in incentivizing the mathematical reasoning capabilities of large language models (LLMs)<n>We identify some interesting phenomena during 1-shot RLVR, including cross-domain generalization, increased frequency of self-reflection, and sustained test performance improvement even after the training accuracy has saturated.
arXiv Detail & Related papers (2025-04-29T09:24:30Z) - UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning [31.796328505473305]
We propose UI-R1, the first framework to explore how rule-based RL can enhance the reasoning capabilities of multimodal large language models (MLLMs) for GUI action prediction tasks.<n>Specifically, UI-R1 introduces a novel rule-based action reward, enabling model optimization via policy-based algorithms such as Group Relative Policy Optimization (GRPO)<n>For efficient training, we curate a small yet high-quality dataset of 136 challenging tasks, encompassing five common action types on mobile devices.
arXiv Detail & Related papers (2025-03-27T15:39:30Z) - A Light Perspective for 3D Object Detection [46.23578780480946]
This paper introduces a novel approach that incorporates cutting-edge Deep Learning techniques into the feature extraction process.<n>Our model, NextBEV, surpasses established feature extractors like ResNet50 and MobileNetV3.<n>By fusing these lightweight proposals, we have enhanced the accuracy of the VoxelNet-based model by 2.93% and improved the F1-score of the PointPillar-based model by approximately 20%.
arXiv Detail & Related papers (2025-03-10T10:03:23Z) - LiteVAR: Compressing Visual Autoregressive Modelling with Efficient Attention and Quantization [17.190984773586745]
Current AR-based visual generation models require substantial computational resources, limiting their applicability on resource-constrained devices.
We propose efficient attention mechanism and low-bit quantization method to enhance the efficiency of VAR models while maintaining performance.
arXiv Detail & Related papers (2024-11-26T07:32:36Z) - For SALE: State-Action Representation Learning for Deep Reinforcement
Learning [60.42044715596703]
SALE is a novel approach for learning embeddings that model the nuanced interaction between state and action.
We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm.
On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively.
arXiv Detail & Related papers (2023-06-04T19:47:46Z) - Towards Practical Lipreading with Distilled and Efficient Models [57.41253104365274]
Lipreading has witnessed a lot of progress due to the resurgence of neural networks.
Recent works have placed emphasis on aspects such as improving performance by finding the optimal architecture or improving generalization.
There is still a significant gap between the current methodologies and the requirements for an effective deployment of lipreading in practical scenarios.
We propose a series of innovations that significantly bridge that gap: first, we raise the state-of-the-art performance by a wide margin on LRW and LRW-1000 to 88.5% and 46.6%, respectively using self-distillation.
arXiv Detail & Related papers (2020-07-13T16:56:27Z) - DeBERTa: Decoding-enhanced BERT with Disentangled Attention [119.77305080520718]
We propose a new model architecture DeBERTa that improves the BERT and RoBERTa models using two novel techniques.
We show that these techniques significantly improve the efficiency of model pre-training and the performance of both natural language understanding (NLU) and natural langauge generation (NLG) downstream tasks.
arXiv Detail & Related papers (2020-06-05T19:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.