GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
- URL: http://arxiv.org/abs/2508.06471v1
- Date: Fri, 08 Aug 2025 17:21:06 GMT
- Title: GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
- Authors: GLM-4. 5 Team, :, Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, Cunxiang Wang, Da Yin, Hao Zeng, Jiajie Zhang, Kedong Wang, Lucen Zhong, Mingdao Liu, Rui Lu, Shulin Cao, Xiaohan Zhang, Xuancheng Huang, Yao Wei, Yean Cheng, Yifan An, Yilin Niu, Yuanhao Wen, Yushi Bai, Zhengxiao Du, Zihan Wang, Zilin Zhu, Bohan Zhang, Bosi Wen, Bowen Wu, Bowen Xu, Can Huang, Casey Zhao, Changpeng Cai, Chao Yu, Chen Li, Chendi Ge, Chenghua Huang, Chenhui Zhang, Chenxi Xu, Chenzheng Zhu, Chuang Li, Congfeng Yin, Daoyan Lin, Dayong Yang, Dazhi Jiang, Ding Ai, Erle Zhu, Fei Wang, Gengzheng Pan, Guo Wang, Hailong Sun, Haitao Li, Haiyang Li, Haiyi Hu, Hanyu Zhang, Hao Peng, Hao Tai, Haoke Zhang, Haoran Wang, Haoyu Yang, He Liu, He Zhao, Hongwei Liu, Hongxi Yan, Huan Liu, Huilong Chen, Ji Li, Jiajing Zhao, Jiamin Ren, Jian Jiao, Jiani Zhao, Jianyang Yan, Jiaqi Wang, Jiayi Gui, Jiayue Zhao, Jie Liu, Jijie Li, Jing Li, Jing Lu, Jingsen Wang, Jingwei Yuan, Jingxuan Li, Jingzhao Du, Jinhua Du, Jinxin Liu, Junkai Zhi, Junli Gao, Ke Wang, Lekang Yang, Liang Xu, Lin Fan, Lindong Wu, Lintao Ding, Lu Wang, Man Zhang, Minghao Li, Minghuan Xu, Mingming Zhao, Mingshu Zhai, Pengfan Du, Qian Dong, Shangde Lei, Shangqing Tu, Shangtong Yang, Shaoyou Lu, Shijie Li, Shuang Li, Shuang-Li, Shuxun Yang, Sibo Yi, Tianshu Yu, Wei Tian, Weihan Wang, Wenbo Yu, Weng Lam Tam, Wenjie Liang, Wentao Liu, Xiao Wang, Xiaohan Jia, Xiaotao Gu, Xiaoying Ling, Xin Wang, Xing Fan, Xingru Pan, Xinyuan Zhang, Xinze Zhang, Xiuqing Fu, Xunkai Zhang, Yabo Xu, Yandong Wu, Yida Lu, Yidong Wang, Yilin Zhou, Yiming Pan, Ying Zhang, Yingli Wang, Yingru Li, Yinpei Su, Yipeng Geng, Yitong Zhu, Yongkun Yang, Yuhang Li, Yuhao Wu, Yujiang Li, Yunan Liu, Yunqing Wang, Yuntao Li, Yuxuan Zhang, Zezhen Liu, Zhen Yang, Zhengda Zhou, Zhongpei Qiao, Zhuoer Feng, Zhuorui Liu, Zichen Zhang, Zihan Wang, Zijun Yao, Zikang Wang, Ziqiang Liu, Ziwei Chai, Zixuan Li, Zuodong Zhao, Wenguang Chen, Jidong Zhai, Bin Xu, Minlie Huang, Hongning Wang, Juanzi Li, Yuxiao Dong, Jie Tang,
- Abstract summary: We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters.<n>Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding tasks.<n>We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems.
- Score: 194.64264251080454
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at https://github.com/zai-org/GLM-4.5.
Related papers
- Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters [169.7981969517903]
Step 3.5 Flash bridges frontier-level agentic intelligence and computational efficiency.<n>We focus on what matters most when building agents: sharp reasoning and fast, reliable execution.
arXiv Detail & Related papers (2026-02-11T07:53:51Z) - Efficient Multi-Model Orchestration for Self-Hosted Large Language Models [2.3275796286410677]
Pick and Spin is a framework that makes self-hosted orchestration and economical.<n>It integrates a unified Helm-based deployment system, adaptive scale-to-zero automation, and a hybrid routing module.<n>It achieves up to 21.6% higher success rates, 30% lower latency, and 33% lower cost per query compared with static deployments of the same models.
arXiv Detail & Related papers (2025-12-26T22:42:40Z) - DrugRAG: Enhancing Pharmacy LLM Performance Through A Novel Retrieval-Augmented Generation Pipeline [4.750574899254107]
We benchmarked eleven existing large language models (LLMs) with varying parameter sizes using a 141-question pharmacy dataset.<n>We developed a three-step retrieval-augmented generation (RAG) pipeline, DrugRAG, that retrieves structured drug knowledge from validated sources and augments model prompts with evidence-based context.<n>DrugRAG improved accuracy across all tested models, with gains ranging from 7 to 21 percentage points (e.g., Gemma 3 27B: 61% to 71%, Llama 3.1 8B: 46% to 67%) on the 141-item benchmark.
arXiv Detail & Related papers (2025-12-16T20:19:23Z) - GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning [116.80874591718323]
We present GLM-4.1V-Thinking and GLM-4.5V, a family of vision-language models (VLMs)<n>GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size.<n>The smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks.
arXiv Detail & Related papers (2025-07-01T17:55:04Z) - Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks [0.0]
Small or compact models, though more efficient, often lack sufficient support for underrepresented languages.<n>This work explores the potential of parameter-efficient fine-tuning of compact open-weight language models to handle reasoning-intensive tasks.<n> tuning method with joint task topic and step-by-step solution generation outperforms standard chain-of-thought tuning in matching tasks.
arXiv Detail & Related papers (2025-03-18T07:44:49Z) - Prompting Large Language Models for Clinical Temporal Relation Extraction [5.403858596195122]
This study utilizes four large language models (LLMs) for clinical temporal relation extraction (CTRE)<n>We developed full (FFT) and parameter-efficient (PEFT) fine-tuning strategies and evaluated these strategies on the 2012 i2b2 CTRE task.
arXiv Detail & Related papers (2024-12-04T18:35:28Z) - Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance.<n>Existing direct preference learning algorithms are originally designed for the single-turn chat task.<n>We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z) - DataComp-LM: In search of the next generation of training sets for language models [200.5293181577585]
DataComp for Language Models (DCLM) is a testbed for controlled dataset experiments with the goal of improving language models.<n>We provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations.<n>Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters.
arXiv Detail & Related papers (2024-06-17T17:42:57Z) - Fleet of Agents: Coordinated Problem Solving with Large Language Models [10.167121757937062]
Fleet of Agents (FoA) is a principled framework utilizing large language models as agents to navigate through dynamic tree searches.<n>FoA spawns a multitude of agents, each exploring the search space autonomously, followed by a selection phase.<n>FoA achieves the best cost-quality trade-off among all benchmarked methods and FoA + LMA3.2-11B surpasses the Llama3.2-90B model.
arXiv Detail & Related papers (2024-05-07T09:36:23Z) - How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments [83.78240828340681]
GAMA($gamma$)-Bench is a new framework for evaluating Large Language Models' Gaming Ability in Multi-Agent environments.<n>$gamma$-Bench includes eight classical game theory scenarios and a dynamic scoring scheme specially designed to assess LLMs' performance.<n>Our results indicate GPT-3.5 demonstrates strong robustness but limited generalizability, which can be enhanced using methods like Chain-of-Thought.
arXiv Detail & Related papers (2024-03-18T14:04:47Z) - Scaling Instruction-Finetuned Language Models [126.4789306516927]
Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance.
We find that instruction finetuning dramatically improves performance on a variety of model classes.
arXiv Detail & Related papers (2022-10-20T16:58:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.