AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
- URL: http://arxiv.org/abs/2505.08311v2
- Date: Sun, 25 May 2025 07:57:14 GMT
- Title: AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
- Authors: Yunjie Ji, Xiaoyu Tian, Sitong Zhao, Haotian Wang, Shuaiting Chen, Yiping Peng, Han Zhao, Xiangang Li,
- Abstract summary: We present AM-Thinking-v1, a 32B dense language model that advances the frontier of reasoning.<n>Outperforming DeepSeek-R1 and rivaling leading Mixture-of-Experts (MoE) models like Qwen3-235B-A22B and Seed1.5-Thinking, AM-Thinking-v1 achieves impressive scores of 85.3 on AIME 2024, 74.4 on AIME 2025, and 70.3 on LiveCodeBench.
- Score: 16.441081996257576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present AM-Thinking-v1, a 32B dense language model that advances the frontier of reasoning, embodying the collaborative spirit of open-source innovation. Outperforming DeepSeek-R1 and rivaling leading Mixture-of-Experts (MoE) models like Qwen3-235B-A22B and Seed1.5-Thinking, AM-Thinking-v1 achieves impressive scores of 85.3 on AIME 2024, 74.4 on AIME 2025, and 70.3 on LiveCodeBench, showcasing state-of-the-art mathematical and coding capabilities among open-source models of similar scale. Built entirely from the open-source Qwen2.5-32B base model and publicly available queries, AM-Thinking-v1 leverages a meticulously crafted post-training pipeline - combining supervised fine-tuning and reinforcement learning - to deliver exceptional reasoning capabilities. This work demonstrates that the open-source community can achieve high performance at the 32B scale, a practical sweet spot for deployment and fine-tuning. By striking a balance between top-tier performance and real-world usability, we hope AM-Thinking-v1 inspires further collaborative efforts to harness mid-scale models, pushing reasoning boundaries while keeping accessibility at the core of innovation. We have open-sourced our model on \href{https://huggingface.co/a-m-team/AM-Thinking-v1}{Hugging Face}.
Related papers
- Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction [95.91743732150233]
Goedel-Prover-V2, a series of open-source language models, set a new state-of-the-art in automated theorem proving.<n>We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems.<n>Goedel-Prover-V2-32B achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode.
arXiv Detail & Related papers (2025-08-05T16:28:22Z) - MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization [74.04867639197445]
MiroMind-M1 is a set of fully open-source RLMs built on the Qwen-2.5-based benchmarks.<n>Our models are trained in two stages: SFT on a carefully curated corpus of 719K math-reasoning problems with verified CoT trajectories, followed by RLVR on 62K challenging and verifiable problems.
arXiv Detail & Related papers (2025-07-19T16:21:23Z) - KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z) - Skywork Open Reasoner 1 Technical Report [51.403686909760914]
We present Skywork-OR1, an effective and scalable reinforcement learning (RL) implementation for long Chain-of-Thought (CoT) models.<n>Building on the DeepSeek-R1-Distill model series, our RL approach achieves notable performance gains.<n>Our Skywork-OR1-32B model surpasses both DeepSeek-R1 and Qwen3-32B on the AIME24 and AIME25 benchmarks.
arXiv Detail & Related papers (2025-05-28T12:56:04Z) - Not All Correct Answers Are Equal: Why Your Distillation Source Matters [16.441081996257576]
Distillation has emerged as a practical and effective approach to enhance the reasoning capabilities of open-source language models.<n>We collect verified outputs from three state-of-the-art teacher models-AM-Thinking-v1, Qwen3-235B-A22B, and DeepSeek-R1-on a shared corpus of 1.89 million queries.<n>Student models trained on each dataset are evaluated on reasoning benchmarks including AIME2024, AIME2025, MATH500, and LiveCodeBench.
arXiv Detail & Related papers (2025-05-20T15:00:51Z) - Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [17.233735911531117]
We present Skywork R1V2, a next-generation multimodal reasoning model.<n>At its core, R1V2 introduces a hybrid reinforcement learning paradigm.
arXiv Detail & Related papers (2025-04-23T12:24:10Z) - Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning [58.86928947970342]
Embodied-R is a framework combining large-scale Vision-Language Models for perception and small-scale Language Models for reasoning.<n>After training on only 5k embodied video samples, Embodied-R with a 3B LM matches state-of-the-art multimodal reasoning models.<n>Embodied-R also exhibits emergent thinking patterns such as systematic analysis and contextual integration.
arXiv Detail & Related papers (2025-04-17T06:16:11Z) - Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning [231.11339402237903]
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding.<n>Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA.<n>It demonstrates excellent reasoning abilities in STEM and coding.
arXiv Detail & Related papers (2025-04-10T17:10:51Z) - START: Self-taught Reasoner with Tools [51.38785489790888]
We introduce START (Self-Taught Reasoner with Tools), a tool-integrated long Chain-of-thought (CoT) reasoning LLM.<n> START is capable of performing complex computations, self-checking, exploring diverse methods, and self-ging.<n>It significantly outperforms the base QwQ-32B and achieves performance comparable to the state-of-the-art open-weight model R1-Distill-Qwen-32B.
arXiv Detail & Related papers (2025-03-06T17:11:51Z) - Hymba: A Hybrid-head Architecture for Small Language Models [65.94140459055244]
Hymba is a family of small language models featuring a hybrid-head parallel architecture.
We introduce learnable meta tokens that are prepended to prompts, storing critical information.
This model is further optimized by incorporating cross-layer key-value sharing and partial sliding window attention.
arXiv Detail & Related papers (2024-11-20T19:51:25Z) - The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub [2.595302141947391]
We analyse development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models.
Activity is imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads.
We find that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers.
arXiv Detail & Related papers (2024-05-20T11:10:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.