Related papers: Apriel-1.5-15b-Thinker

Apriel-1.5-15b-Thinker

URL: http://arxiv.org/abs/2510.01141v1
Date: Wed, 01 Oct 2025 17:29:35 GMT
Title: Apriel-1.5-15b-Thinker
Authors: Shruthan Radhakrishna, Aman Tiwari, Aanjaneya Shukla, Masoud Hashemi, Rishabh Maheshwary, Shiva Krishna Reddy Malay, Jash Mehta, Pulkit Pattnaik, Saloni Mittal, Khalil Slimi, Kelechi Ogueji, Akintunde Oladipo, Soham Parikh, Oluwanifemi Bamgbose, Toby Liang, Ahmed Masry, Khyati Mahajan, Sai Rajeswar Mudumba, Vikas Yadav, Sathwik Tejaswi Madhusudhan, Torsten Scholak, Sagar Davasam, Srinivas Sunkara, Nicholas Chapados,
Abstract summary: Apriel-1.5-15B-Thinker is a 15-billion parameter open-weights multimodal reasoning model.<n>It achieves frontier-level performance through training design rather than sheer scale.
Score: 19.19917266898226
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Apriel-1.5-15B-Thinker, a 15-billion parameter open-weights multimodal reasoning model that achieves frontier-level performance through training design rather than sheer scale. Starting from Pixtral-12B, we apply a progressive three-stage methodology: (1) depth upscaling to expand reasoning capacity without pretraining from scratch, (2) staged continual pre-training that first develops foundational text and vision understanding, then enhances visual reasoning through targeted synthetic data generation addressing spatial structure, compositional understanding, and fine-grained perception, and (3) high-quality text-only supervised fine-tuning on curated instruction-response pairs with explicit reasoning traces spanning mathematics, coding, science, and tool use. Notably, our model achieves competitive results without reinforcement learning or preference optimization, isolating the contribution of our data-centric continual pre-training approach. On the Artificial Analysis Intelligence Index, Apriel-1.5-15B-Thinker attains a score of 52, matching DeepSeek-R1-0528 despite requiring significantly fewer computational resources. Across ten image benchmarks, its performance is on average within five points of Gemini-2.5-Flash and Claude Sonnet-3.7, a key achievement for a model operating within single-GPU deployment constraints. Our results demonstrate that thoughtful mid-training 2 design can close substantial capability gaps without massive scale, making frontier-level multimodal reasoning accessible to organizations with limited infrastructure. We release the model checkpoint, all training recipes, and evaluation protocols under the MIT license to to advance open-source research.

Related papers

GuardReasoner-Omni: A Reasoning-based Multi-modal Guardrail for Text, Image, and Video [38.35856368247741]
GuardReasoner- Omni is a guardrail model designed to moderate text, image, and video data.<n>We construct a comprehensive training corpus comprising 148k samples spanning these three modalities.<n>Our training pipeline follows a two-stage paradigm to incentivize the model to deliberate before making decisions.
arXiv Detail & Related papers (2026-02-03T09:56:20Z)
Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning [4.464939140209426]
We propose a bottom-up learning paradigm in which models are grounded in axiomatic domain facts and compose them to solve complex, unseen tasks.<n>By deriving novel reward signals from knowledge graph paths, we provide verifiable, scalable, and grounded supervision.<n>Our experiments show that path-derived rewards act as a "compositional bridge", enabling our model to significantly outperform larger models.
arXiv Detail & Related papers (2026-01-21T16:38:59Z)
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models [219.58681099795186]
We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance.<n>We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity.<n>By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5.
arXiv Detail & Related papers (2025-12-02T09:25:14Z)
Teaching Language Models to Reason with Tools [73.21700643314917]
We present emphHint-Engineering, a new data synthesis strategy that strategically injects diverse hints at optimal points within reasoning paths.<n>CoRT significantly enhances efficiency, reducing token usage by approximately 30% for the 32B model and 50% for the 1.5B model.
arXiv Detail & Related papers (2025-10-23T08:41:44Z)
SpatialLadder: Progressive Training for Spatial Reasoning in Vision-Language Models [73.19077622773075]
We present a comprehensive methodology for building spatial intelligence progressively.<n>We introduce SpatialLadder-26k, a multimodal dataset containing 26,610 samples spanning object localization, single image, multi-view, and video spatial reasoning tasks.<n>We design a three-stage progressive training framework that establishes spatial perception through object localization, develops spatial understanding through multi-dimensional spatial tasks, and strengthens complex reasoning via reinforcement learning with verifiable rewards.
arXiv Detail & Related papers (2025-10-09T17:50:54Z)
Reinforcement Mid-Training [16.826401071555704]
We propose a framework for efficient, adaptive, and unified reinforcement mid-training.<n>We show that RMT achieves up to +64.91% performance improvement with only 21% of the reasoning length in language modeling.<n>We also show that checkpoints obtained after reinforcement mid-training can benefit the subsequent post-training, yielding up to +18.76% improvement in the mathematical domain.
arXiv Detail & Related papers (2025-09-29T07:21:24Z)
ReasonBridge: Efficient Reasoning Transfer from Closed to Open-Source Language Models [1.125423117145132]
This paper introduces ReasonBridge, a methodology that efficiently transfers reasoning capabilities from powerful closed-source to open-source models.<n>We develop a tailored dataset Reason1K with only 1,000 carefully curated reasoning traces emphasizing difficulty, diversity, and quality.<n> Comprehensive evaluations demonstrate that ReasonBridge improves reasoning capabilities in open-source models by up to 23% on benchmark tasks.
arXiv Detail & Related papers (2025-06-28T12:22:55Z)
Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z)
Phi-4-reasoning Technical Report [42.508165017775]
We introduce Phi-4-reasoning, a 14-billion parameter reasoning model that achieves strong performance on complex reasoning tasks.<n>We develop Phi-4-reasoning-plus, a variant enhanced through a short phase of outcome-based reinforcement learning.<n>Both models outperform significantly larger open-weight models such as DeepSeek-R1-Distill-Llama-70B model and approach the performance levels of full DeepSeek-R1 model.
arXiv Detail & Related papers (2025-04-30T05:05:09Z)
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning [58.86928947970342]
Embodied-R is a framework combining large-scale Vision-Language Models for perception and small-scale Language Models for reasoning.<n>After training on only 5k embodied video samples, Embodied-R with a 3B LM matches state-of-the-art multimodal reasoning models.<n>Embodied-R also exhibits emergent thinking patterns such as systematic analysis and contextual integration.
arXiv Detail & Related papers (2025-04-17T06:16:11Z)
START: Self-taught Reasoner with Tools [51.38785489790888]
We introduce START (Self-Taught Reasoner with Tools), a tool-integrated long Chain-of-thought (CoT) reasoning LLM.<n> START is capable of performing complex computations, self-checking, exploring diverse methods, and self-ging.<n>It significantly outperforms the base QwQ-32B and achieves performance comparable to the state-of-the-art open-weight model R1-Distill-Qwen-32B.
arXiv Detail & Related papers (2025-03-06T17:11:51Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [51.41246396610475]
This paper aims to predict performance in closed-book question answering (QA) without the help of external tools.<n>We conduct large-scale retrieval and semantic analysis across the pre-training corpora of 21 publicly available and 3 custom-trained large language models.<n>Building on these foundations, we propose Size-dependent Mutual Information (SMI), an information-theoretic metric that linearly correlates pre-training data characteristics.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge [0.0]
We introduce CodingTeachLLM, a large language model (LLM) designed for coding teaching.<n>Our model realizes the structural disassembly and incremental guided output of educational knowledge.<n>Our model also achieves state-of-the-art in code abilities compared to open-source models.
arXiv Detail & Related papers (2024-03-13T05:38:39Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups. We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective. Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.