Related papers: Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning

Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning

URL: http://arxiv.org/abs/2601.07238v1
Date: Mon, 12 Jan 2026 06:19:09 GMT
Title: Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning
Authors: Hanbin Wang, Jingwei Song, Jinpeng Li, Fei Mi, Lifeng Shang,
Abstract summary: Group Pattern Selection Optimization (GPSO) is a reinforcement learning framework for large reasoning models.<n>GPSO incorporates multi-pattern rollouts, verifier-guided optimal pattern selection per problem, and attention masking to prevent the leakage of explicit pattern suffixes into the learned policy.<n>Extensive experiments demonstrate that GPSO delivers consistent and substantial performance gains across various model backbones and benchmarks.
Score: 38.16271055029922
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large reasoning models (LRMs) exhibit diverse high-level reasoning patterns (e.g., direct solution, reflection-and-verification, and exploring multiple solutions), yet prevailing training recipes implicitly bias models toward a limited set of dominant patterns. Through a systematic analysis, we identify substantial accuracy variance across these patterns on mathematics and science benchmarks, revealing that a model's default reasoning pattern is often sub-optimal for a given problem. To address this, we introduce Group Pattern Selection Optimization (GPSO), a reinforcement learning framework that extends GRPO by incorporating multi-pattern rollouts, verifier-guided optimal pattern selection per problem, and attention masking during optimization to prevent the leakage of explicit pattern suffixes into the learned policy. By exploring a portfolio of diverse reasoning strategies and optimizing the policy on the most effective ones, GPSO enables the model to internalize the mapping from problem characteristics to optimal reasoning patterns. Extensive experiments demonstrate that GPSO delivers consistent and substantial performance gains across various model backbones and benchmarks, effectively mitigating pattern sub-optimality and fostering more robust, adaptable reasoning. All data and codes are available at https://github.com/wanghanbinpanda/GPSO.

Related papers

On the Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks [56.98385132295952]
We evaluate how well chain-of-thought approaches generalize on a simple planning task.<n>We find that reasoning traces which combine multiple text formats yield the best (and non-trivial) OOD generalization.<n> purely text-based models consistently outperform those utilizing image-based inputs.
arXiv Detail & Related papers (2026-02-17T09:51:40Z)
Multimodal Large Language Models with Adaptive Preference Optimization for Sequential Recommendation [60.33386541343322]
We propose a Multimodal Large Language Models framework that integrates Hardness-aware and Noise-regularized preference optimization for Recommendation (HaNoRec)<n>Specifically, HaNoRec dynamically adjusts optimization weights based on both the estimated hardness of each training sample and the policy model's real-time responsiveness.
arXiv Detail & Related papers (2025-11-24T04:10:46Z)
GCPO: When Contrast Fails, Go Gold [6.596504114809683]
We introduce Group Contrastive Policy Optimization (GCPO), a method that incorporates external standard reference answers.<n>When the model cannot solve a problem, the reference answer supplies the correct response, steering the model toward an unequivocally accurate update direction.<n>GCPO achieves outstanding results across multiple benchmark datasets, yielding substantial improvements over the baseline model.
arXiv Detail & Related papers (2025-10-09T05:09:06Z)
Divergence Minimization Preference Optimization for Diffusion Model Alignment [66.31417479052774]
Divergence Minimization Preference Optimization (DMPO) is a principled method for aligning diffusion models by minimizing reverse KL divergence.<n>DMPO can consistently outperform or match existing techniques across different base models and test sets.
arXiv Detail & Related papers (2025-07-10T07:57:30Z)
Landscape Features in Single-Objective Continuous Optimization: Have We Hit a Wall in Algorithm Selection Generalization? [4.510532471907222]
This study evaluates the generalizability of AS models based on different problem representations.<n>It considers the most widely used Exploratory Landscape Analysis features, as well as recently proposed Topological Landscape Analysis features.
arXiv Detail & Related papers (2025-01-29T14:03:27Z)
Diffusion Models as Network Optimizers: Explorations and Analysis [71.69869025878856]
generative diffusion models (GDMs) have emerged as a promising new approach to network optimization.<n>In this study, we first explore the intrinsic characteristics of generative models.<n>We provide a concise theoretical and intuitive demonstration of the advantages of generative models over discriminative network optimization.
arXiv Detail & Related papers (2024-11-01T09:05:47Z)
Step-level Value Preference Optimization for Mathematical Reasoning [6.318873143509028]
We introduce a novel algorithm called Step-level Value Preference Optimization (SVPO) Our method achieves state-of-the-art performance on both in-domain and out-of-domain mathematical reasoning benchmarks.
arXiv Detail & Related papers (2024-06-16T09:06:17Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Personalizing Performance Regression Models to Black-Box Optimization Problems [0.755972004983746]
In this work, we propose a personalized regression approach for numerical optimization problems. We also investigate the impact of selecting not a single regression model per problem, but personalized ensembles. We test our approach on predicting the performance of numerical optimizations on the BBOB benchmark collection.
arXiv Detail & Related papers (2021-04-22T11:47:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.