Related papers: AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale

AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale

URL: http://arxiv.org/abs/2505.08311v2
Date: Sun, 25 May 2025 07:57:14 GMT
Title: AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
Authors: Yunjie Ji, Xiaoyu Tian, Sitong Zhao, Haotian Wang, Shuaiting Chen, Yiping Peng, Han Zhao, Xiangang Li,
Abstract summary: We present AM-Thinking-v1, a 32B dense language model that advances the frontier of reasoning.<n>Outperforming DeepSeek-R1 and rivaling leading Mixture-of-Experts (MoE) models like Qwen3-235B-A22B and Seed1.5-Thinking, AM-Thinking-v1 achieves impressive scores of 85.3 on AIME 2024, 74.4 on AIME 2025, and 70.3 on LiveCodeBench.
Score: 16.441081996257576
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present AM-Thinking-v1, a 32B dense language model that advances the frontier of reasoning, embodying the collaborative spirit of open-source innovation. Outperforming DeepSeek-R1 and rivaling leading Mixture-of-Experts (MoE) models like Qwen3-235B-A22B and Seed1.5-Thinking, AM-Thinking-v1 achieves impressive scores of 85.3 on AIME 2024, 74.4 on AIME 2025, and 70.3 on LiveCodeBench, showcasing state-of-the-art mathematical and coding capabilities among open-source models of similar scale. Built entirely from the open-source Qwen2.5-32B base model and publicly available queries, AM-Thinking-v1 leverages a meticulously crafted post-training pipeline - combining supervised fine-tuning and reinforcement learning - to deliver exceptional reasoning capabilities. This work demonstrates that the open-source community can achieve high performance at the 32B scale, a practical sweet spot for deployment and fine-tuning. By striking a balance between top-tier performance and real-world usability, we hope AM-Thinking-v1 inspires further collaborative efforts to harness mid-scale models, pushing reasoning boundaries while keeping accessibility at the core of innovation. We have open-sourced our model on \href{https://huggingface.co/a-m-team/AM-Thinking-v1}{Hugging Face}.

Related papers

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction [95.91743732150233]
Goedel-Prover-V2, a series of open-source language models, set a new state-of-the-art in automated theorem proving.<n>We generate synthetic tasks of increasing difficulty to train the model to master increasingly complex theorems.<n>Goedel-Prover-V2-32B achieves 88.1% on MiniF2F at pass@32 in standard mode and 90.4% in self-correction mode.
arXiv Detail & Related papers (2025-08-05T16:28:22Z)
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization [74.04867639197445]
MiroMind-M1 is a set of fully open-source RLMs built on the Qwen-2.5-based benchmarks.<n>Our models are trained in two stages: SFT on a carefully curated corpus of 719K math-reasoning problems with verified CoT trajectories, followed by RLVR on 62K challenging and verifiable problems.
arXiv Detail & Related papers (2025-07-19T16:21:23Z)
KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z)
Skywork Open Reasoner 1 Technical Report [51.403686909760914]
We present Skywork-OR1, an effective and scalable reinforcement learning (RL) implementation for long Chain-of-Thought (CoT) models.<n>Building on the DeepSeek-R1-Distill model series, our RL approach achieves notable performance gains.<n>Our Skywork-OR1-32B model surpasses both DeepSeek-R1 and Qwen3-32B on the AIME24 and AIME25 benchmarks.
arXiv Detail & Related papers (2025-05-28T12:56:04Z)
Not All Correct Answers Are Equal: Why Your Distillation Source Matters [16.441081996257576]
Distillation has emerged as a practical and effective approach to enhance the reasoning capabilities of open-source language models.<n>We collect verified outputs from three state-of-the-art teacher models-AM-Thinking-v1, Qwen3-235B-A22B, and DeepSeek-R1-on a shared corpus of 1.89 million queries.<n>Student models trained on each dataset are evaluated on reasoning benchmarks including AIME2024, AIME2025, MATH500, and LiveCodeBench.
arXiv Detail & Related papers (2025-05-20T15:00:51Z)
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning [17.233735911531117]
We present Skywork R1V2, a next-generation multimodal reasoning model.<n>At its core, R1V2 introduces a hybrid reinforcement learning paradigm.
arXiv Detail & Related papers (2025-04-23T12:24:10Z)
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning [58.86928947970342]
Embodied-R is a framework combining large-scale Vision-Language Models for perception and small-scale Language Models for reasoning.<n>After training on only 5k embodied video samples, Embodied-R with a 3B LM matches state-of-the-art multimodal reasoning models.<n>Embodied-R also exhibits emergent thinking patterns such as systematic analysis and contextual integration.
arXiv Detail & Related papers (2025-04-17T06:16:11Z)
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning [231.11339402237903]
We introduce Seed1.5-Thinking, capable of reasoning through thinking before responding.<n>Seed1.5-Thinking achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA.<n>It demonstrates excellent reasoning abilities in STEM and coding.
arXiv Detail & Related papers (2025-04-10T17:10:51Z)
START: Self-taught Reasoner with Tools [51.38785489790888]
We introduce START (Self-Taught Reasoner with Tools), a tool-integrated long Chain-of-thought (CoT) reasoning LLM.<n> START is capable of performing complex computations, self-checking, exploring diverse methods, and self-ging.<n>It significantly outperforms the base QwQ-32B and achieves performance comparable to the state-of-the-art open-weight model R1-Distill-Qwen-32B.
arXiv Detail & Related papers (2025-03-06T17:11:51Z)
Hymba: A Hybrid-head Architecture for Small Language Models [65.94140459055244]
Hymba is a family of small language models featuring a hybrid-head parallel architecture. We introduce learnable meta tokens that are prepended to prompts, storing critical information. This model is further optimized by incorporating cross-layer key-value sharing and partial sliding window attention.
arXiv Detail & Related papers (2024-11-20T19:51:25Z)
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub [2.595302141947391]
We analyse development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. Activity is imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. We find that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers.
arXiv Detail & Related papers (2024-05-20T11:10:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.