FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models
- URL: http://arxiv.org/abs/2510.11190v3
- Date: Thu, 06 Nov 2025 06:08:08 GMT
- Title: FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models
- Authors: Shengming Yuan, Xinyu Lyu, Shuailong Wang, Beitao Chen, Jingkuan Song, Lianli Gao,
- Abstract summary: Multimodal large language models (MLLMs) face an inherent trade-off between faithfulness and creativity.<n>Existing methods lack the flexibility to modulate this reasoning strength.<n>We propose equipping MLLMs with mechanisms that enable flexible control over associative reasoning.
- Score: 80.6268239673988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal large language models (MLLMs) face an inherent trade-off between faithfulness and creativity, as different tasks require varying degrees of associative reasoning. However, existing methods lack the flexibility to modulate this reasoning strength, limiting MLLMs' adaptability across factual and creative scenarios. To bridge this gap, we propose equipping MLLMs with mechanisms that enable flexible control over associative reasoning. We begin by investigating the internal mechanisms underlying associative behavior in MLLMs and find that: (1) middle layers play a pivotal role in shaping model's associative tendencies, (2) modifying representations in these layers effectively regulates associative reasoning strength, and (3) hallucinations can be exploited to derive steering vectors that guide this modulation. Building on these findings, we introduce Flexible Association Control (FlexAC), a lightweight and training-free framework for modulating associative behavior in MLLMs. FlexAC first induces hallucination-guided intermediate representations to encode associative directions. Then, it selects high-association instances to construct effective associative steering vectors, whose strengths are adaptively calibrated to balance creative guidance with output stability. Finally, recognizing the multi-dimensional nature of associative reasoning, FlexAC incorporates task-specific associative vectors derived from a forward pass on a few target-domain samples, enabling models to follow diverse associative directions and better adapt to creative tasks. Notably, our method achieves up to a 5.8x improvement in creativity on Creation-MMBench and a 29% reduction in hallucination rate on CHAIR, surpassing existing baselines and demonstrating its effectiveness in enabling flexible control over associative reasoning in MLLMs. Our code is available at https://github.com/ylhz/FlexAC.
Related papers
- Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation [50.22481337087162]
Referring Video Object (RVOS) aims to segment objects in videos based on textual queries.<n>Refer-Agent is a collaborative multi-agent system with alternating reasoning-reflection mechanisms.
arXiv Detail & Related papers (2026-02-03T14:48:12Z) - GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models [23.159388800893964]
We argue that alignment is most effective when both modalities share a unified geometric basis.<n>We employ a decoder-only quantizer with Gumbel-Softmax for differentiable training and balanced codebook usage.<n>Our framework achieves a 20% performance improvement over current state-of-the-art methods.
arXiv Detail & Related papers (2026-01-12T15:14:29Z) - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [104.31926740841128]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z) - MoL-RL: Distilling Multi-Step Environmental Feedback into LLMs for Feedback-Independent Reasoning [3.486190892832845]
MoL-RL is a novel training paradigm that integrates multi-step EF signals into large language models.<n>We show that MoL-RL achieves state-of-the-art performance with the Qwen3-8B model.
arXiv Detail & Related papers (2025-07-27T13:52:15Z) - Reasoning and Behavioral Equilibria in LLM-Nash Games: From Mindsets to Actions [15.764094200832071]
We introduce the LLM-Nash framework, a game-theoretic model where agents select reasoning prompts to guide decision-making via Large Language Models (LLMs)<n>Unlike classical games that assume utility-maximizing agents with full rationality, this framework captures bounded rationality by modeling the reasoning process explicitly.
arXiv Detail & Related papers (2025-07-10T22:43:00Z) - Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment [15.51412479114864]
We introduce MAPLE (Modality-Aligned Preference Learning for Embeddings), a novel framework that guides cross modal representation learning.<n>MaPLE formulates the learning process as reinforcement learning with two key components: automatic preference data construction using off-the-shelf MLLM, and a new Relative Preference Alignment (RPA) loss.<n> Experimental results show that our preference-guided alignment achieves substantial gains in fine-grained cross-modal retrieval.
arXiv Detail & Related papers (2025-06-08T02:33:35Z) - Analyzing Finetuning Representation Shift for Multimodal LLMs Steering [56.710375516257876]
We propose to map hidden states to interpretable visual and textual concepts.<n>This enables us to more efficiently compare certain semantic dynamics, such as the shift from an original and fine-tuned model.<n>We also demonstrate the use of shift vectors to capture these concepts changes.
arXiv Detail & Related papers (2025-01-06T13:37:13Z) - LLMs Can Evolve Continually on Modality for X-Modal Reasoning [62.2874638875554]
Existing methods rely heavily on modal-specific pretraining and joint-modal tuning, leading to significant computational burdens when expanding to new modalities.
We propose PathWeave, a flexible and scalable framework with modal-Path sWitching and ExpAnsion abilities.
PathWeave performs comparably to state-of-the-art MLLMs while concurrently reducing parameter training burdens by 98.73%.
arXiv Detail & Related papers (2024-10-26T13:19:57Z) - CoMMIT: Coordinated Multimodal Instruction Tuning [90.1532838391285]
Multimodal large language models (MLLMs) generally involve cooperative learning between a backbone LLM and a feature encoder of non-text input modalities.<n>In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.<n>We propose a Multimodal Balance Coefficient that enables quantitative measurement of the balance of learning.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - User-Controlled Knowledge Fusion in Large Language Models: Balancing
Creativity and Hallucination [5.046007553593371]
Large Language Models (LLMs) generate diverse, relevant, and creative responses.
Striking a balance between the LLM's imaginative capabilities and its adherence to factual information is a key challenge.
This paper presents an innovative user-controllable mechanism that modulates the balance between an LLM's imaginative capabilities and its adherence to factual information.
arXiv Detail & Related papers (2023-07-30T06:06:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.