Related papers: Perception in Reflection

Perception in Reflection

URL: http://arxiv.org/abs/2504.07165v1
Date: Wed, 09 Apr 2025 17:59:02 GMT
Title: Perception in Reflection
Authors: Yana Wei, Liang Zhao, Kangheng Lin, En Yu, Yuang Peng, Runpei Dong, Jianjian Sun, Haoran Wei, Zheng Ge, Xiangyu Zhang, Vishal M. Patel,
Abstract summary: We present a perception in reflection paradigm designed to transcend the limitations of current large vision-language models.<n>We propose Reflective Perception (RePer), a dual-model reflection mechanism that systematically alternates between policy and critic models.
Score: 39.33505560810175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a perception in reflection paradigm designed to transcend the limitations of current large vision-language models (LVLMs), which are expected yet often fail to achieve perfect perception initially. Specifically, we propose Reflective Perception (RePer), a dual-model reflection mechanism that systematically alternates between policy and critic models, enables iterative refinement of visual perception. This framework is powered by Reflective Perceptual Learning (RPL), which reinforces intrinsic reflective capabilities through a methodically constructed visual reflection dataset and reflective unlikelihood training. Comprehensive experimental evaluation demonstrates RePer's quantifiable improvements in image understanding, captioning precision, and hallucination reduction. Notably, RePer achieves strong alignment between model attention patterns and human visual focus, while RPL optimizes fine-grained and free-form preference alignment. These advancements establish perception in reflection as a robust paradigm for future multimodal agents, particularly in tasks requiring complex reasoning and multi-step manipulation.

Related papers

SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning [25.02860760920562]
Multimodal large language models (MLLMs) have shown promising capabilities in reasoning tasks, but struggle with complex problems requiring explicit self-reflection and self-correction.<n>Existing reflection methods are simplistic and struggle to generate meaningful and instructive feedback.<n>We propose Multimodal Self-Reflection enhanced reasoning with Group Relative Policy Optimization (SRPO), a two-stage reflection-aware reinforcement learning framework.
arXiv Detail & Related papers (2025-06-02T14:21:44Z)
Dereflection Any Image with Diffusion Priors and Diversified Data [86.15504914121226]
We propose a comprehensive solution with an efficient data preparation pipeline and a generalizable model for robust reflection removal.<n>First, we introduce a dataset named Diverse Reflection Removal (DRR) created by randomly rotating reflective mediums in target scenes.<n>Second, we propose a diffusion-based framework with one-step diffusion for deterministic outputs and fast inference.
arXiv Detail & Related papers (2025-03-21T17:48:14Z)
Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction [11.838351314880736]
Instruct-of-Reflection (IoRT) is a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of Large Language Models (LLMs)<n>Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-03-02T14:02:03Z)
Meta-Reflection: A Feedback-Free Reflection Learning Framework [57.14485943991588]
We propose Meta-Reflection, a feedback-free reflection mechanism that requires only a single inference pass without external feedback.<n>Motivated by the human ability to remember and retrieve reflections from past experiences, Meta-Reflection integrates reflective insights into a codebook.<n>To thoroughly investigate and evaluate the practicality of Meta-Reflection in real-world scenarios, we introduce an industrial e-commerce benchmark named E-commerce Customer Intent Detection.
arXiv Detail & Related papers (2024-12-18T12:20:04Z)
Planar Reflection-Aware Neural Radiance Fields [32.709468082010126]
We introduce a reflection-aware NeRF that jointly models planar reflectors, such as windows, and explicitly casts reflected rays to capture the source of the high-frequency reflections. Rendering along the primary ray results in a clean, reflection-free view, while explicitly rendering along the reflected ray allows us to reconstruct highly detailed reflections.
arXiv Detail & Related papers (2024-11-07T18:55:08Z)
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models [36.119299938503936]
Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks. They remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions. We propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning.
arXiv Detail & Related papers (2024-07-16T06:32:45Z)
Revisiting Single Image Reflection Removal In the Wild [83.42368937164473]
This research focuses on the issue of single-image reflection removal (SIRR) in real-world conditions. We devise an advanced reflection collection pipeline that is highly adaptable to a wide range of real-world reflection scenarios. We develop a large-scale, high-quality reflection dataset named Reflection Removal in the Wild (RRW)
arXiv Detail & Related papers (2023-11-29T02:31:10Z)
TraM-NeRF: Tracing Mirror and Near-Perfect Specular Reflections through Neural Radiance Fields [3.061835990893184]
Implicit representations like Neural Radiance Fields (NeRF) showed impressive results for rendering of complex scenes with fine details. We present a novel reflection tracing method tailored for the involved volume rendering within NeRF. We derive efficient strategies for importance sampling and the transmittance computation along rays from only few samples.
arXiv Detail & Related papers (2023-10-16T17:59:56Z)
Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance [78.34235841168031]
We present a novel two-stage network with reflection-aware guidance (RAGNet) for single image reflection removal (SIRR) RAG can be used (i) to mitigate the effect of reflection from the observation, and (ii) to generate mask in partial convolution for mitigating the effect of deviating from linear combination hypothesis. Experiments on five commonly used datasets demonstrate the quantitative and qualitative superiority of our RAGNet in comparison to the state-of-the-art SIRR methods.
arXiv Detail & Related papers (2020-12-02T03:14:57Z)
Polarized Reflection Removal with Perfect Alignment in the Wild [66.48211204364142]
We present a novel formulation to removing reflection from polarized images in the wild. We first identify the misalignment issues of existing reflection removal datasets. We build a new dataset with more than 100 types of glass in which obtained transmission images are perfectly aligned with input mixed images.
arXiv Detail & Related papers (2020-03-28T13:29:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.