PixelThink: Towards Efficient Chain-of-Pixel Reasoning
- URL: http://arxiv.org/abs/2505.23727v1
- Date: Thu, 29 May 2025 17:55:49 GMT
- Title: PixelThink: Towards Efficient Chain-of-Pixel Reasoning
- Authors: Song Wang, Gongfan Fang, Lingdong Kong, Xiangtai Li, Jianyun Xu, Sheng Yang, Qiang Li, Jianke Zhu, Xinchao Wang,
- Abstract summary: PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
- Score: 70.32510083790069
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing reasoning segmentation approaches typically fine-tune multimodal large language models (MLLMs) using image-text pairs and corresponding mask labels. However, they exhibit limited generalization to out-of-distribution scenarios without an explicit reasoning process. Although recent efforts leverage reinforcement learning through group-relative policy optimization (GRPO) to enhance reasoning ability, they often suffer from overthinking - producing uniformly verbose reasoning chains irrespective of task complexity. This results in elevated computational costs and limited control over reasoning quality. To address this problem, we propose PixelThink, a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty to regulate reasoning generation within a reinforcement learning paradigm. The model learns to compress reasoning length in accordance with scene complexity and predictive confidence. To support comprehensive evaluation, we introduce ReasonSeg-Diff, an extended benchmark with annotated reasoning references and difficulty scores, along with a suite of metrics designed to assess segmentation accuracy, reasoning quality, and efficiency jointly. Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance. Our work contributes novel perspectives towards efficient and interpretable multimodal understanding. The code and model will be publicly available.
Related papers
- ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [53.149817480019834]
Recent advancements in large reasoning models (LRMs) have achieved notable performance enhancements on complex reasoning tasks by scaling up the generation length by Chain-of-Thought (CoT)<n>We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely by injecting the textual hint during the token generation of the reasoning process.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning processes while maintaining performance well.
arXiv Detail & Related papers (2025-06-23T16:20:44Z) - Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute [57.16286134405821]
We propose Fractional Reasoning, a framework that enables continuous control over reasoning intensity at inference time.<n>Our method operates by extracting the latent steering vector associated with deeper reasoning and reapplying it with a tunable scaling factor.<n> Experiments on GSM8K, MATH500, and GPQA demonstrate that Fractional Reasoning consistently improves performance across diverse reasoning tasks and models.
arXiv Detail & Related papers (2025-06-18T21:15:59Z) - When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs [16.659986373052217]
Chain-of-thought reasoning can significantly degrade instruction-following accuracy.<n>This is the first work to systematically expose reasoning-induced failures in instruction-following.
arXiv Detail & Related papers (2025-05-16T16:36:00Z) - Efficient Inference for Large Reasoning Models: A Survey [42.61170621552432]
Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason.<n>However, their deliberative reasoning process leads to inefficiencies in token usage, memory consumption, and inference time.<n>This survey provides a review of efficient inference methods designed specifically for LRMs, focusing on mitigating token inefficiency while preserving the reasoning quality.
arXiv Detail & Related papers (2025-03-29T13:27:46Z) - Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks.<n>Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z) - Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching [60.04718679054704]
Chain-of-Thought prompting elicits step-by-step problem solving, but often at the cost of excessive verbosity in intermediate outputs.<n>We propose Sketch-of-Thought (SoT), a prompting framework that integrates cognitively inspired reasoning paradigms with linguistic constraints.<n>SoT achieves token reductions of up to 78% with minimal accuracy loss across 15 reasoning datasets.
arXiv Detail & Related papers (2025-03-07T06:57:17Z) - Efficient Reasoning with Hidden Thinking [48.96945580741641]
Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities.<n>We propose $textbfHeima$ (as hidden llama), an efficient reasoning framework that leverages reasoning CoTs at hidden latent space.<n>Heima model achieves higher generation efficiency while maintaining or even better zero-shot task accuracy.
arXiv Detail & Related papers (2025-01-31T15:10:29Z) - Think Beyond Size: Adaptive Prompting for More Effective Reasoning [0.0]
We introduce Adaptive Prompting, a dynamic and iterative framework designed to enhance reasoning by incorporating real-time adjustments to prompt structures and validation mechanisms.<n>Results demonstrate that Adaptive Prompting significantly improves performance on diverse reasoning benchmarks, including arithmetic reasoning (GSM8K, MultiArithm), logical reasoning and commonsense tasks.<n>Our approach enables smaller models to achieve competitive performance with larger counterparts, such as GPT-4, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-10T17:14:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.