Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
- URL: http://arxiv.org/abs/2412.07801v1
- Date: Sun, 08 Dec 2024 03:59:59 GMT
- Title: Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor
- Authors: Jiali Chen, Xusen Hei, Yuqi Xue, Yuancheng Wei, Jiayuan Xie, Yi Cai, Qing Li,
- Abstract summary: Large multimodal models (LMMs) have shown remarkable performance in the visual commonsense reasoning (VCR) task.<n>However, the ability of LMMs to correct potential visual commonsense errors in the distractor upon their occurrence is yet under-explored.<n>We are the pioneering research for LMMs to simulate this error correction process.
- Score: 12.829202761125096
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Large multimodal models (LMMs) have shown remarkable performance in the visual commonsense reasoning (VCR) task, which aims to answer a multiple-choice question based on visual commonsense within an image. However, the ability of LMMs to correct potential visual commonsense errors in the distractor upon their occurrence is yet under-explored. Drawing inspiration from how a human teacher crafts challenging distractors to test students' comprehension of the concepts or skills and assists them in identifying and correcting errors toward the answer, we are the pioneering research for LMMs to simulate this error correction process. To this end, we employ GPT-4 as a ``teacher'' to collect the explainable feedback dataset VCR-DF for error correction, which serves as a benchmark to evaluate the ability of LMMs to identify misconceptions and clarify reasons behind the error in VCR distractors toward final answers. In addition, we propose an LMM-based Pedagogical Expert Instructed Feedback Generation (PEIFG) model to incorporate the learnable expert prompts and multimodal instruction as guidance for feedback generation. Experimental results show that our PEIFG significantly outperforms existing LMMs. We believe that our benchmark provides a new direction for evaluating the capabilities of LMMs.
Related papers
- Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning [49.07442840323135]
We propose a new paradigm for perception-oriented instruction tuning, i.e., Q-Adapt.
Our proposed Q-Adapt can achieve a lightweight visual quality evaluator, demonstrating comparable performance.
arXiv Detail & Related papers (2025-04-02T12:02:57Z) - Towards Adaptive Feedback with AI: Comparing the Feedback Quality of LLMs and Teachers on Experimentation Protocols [8.71931996488953]
This study evaluates and compares the feedback quality of large language models (LLMs) with that of human teachers and science education experts.
Our results indicate that LLM-generated feedback shows no significant difference to that of teachers and experts in overall quality.
arXiv Detail & Related papers (2025-02-18T13:22:14Z) - Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing.
We identify several features of LLM responses that shape users' reliance.
We find that explanations increase reliance on both correct and incorrect responses.
We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z) - Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review [11.756344944226495]
We introduce a novel Fault-Aware Distillation via Peer-Review (FAIR) approach.
Instead of merely obtaining gold rationales from teachers, our method asks teachers to identify and explain the student's mistakes.
arXiv Detail & Related papers (2024-10-04T17:59:41Z) - Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models [36.119299938503936]
Large vision-language models (LVLMs) have shown promising performance on a variety of vision-language tasks.
They remain susceptible to hallucinations, generating outputs misaligned with visual content or instructions.
We propose reflective instruction tuning, which integrates rationale learning into visual instruction tuning.
arXiv Detail & Related papers (2024-07-16T06:32:45Z) - MalAlgoQA: Pedagogical Evaluation of Counterfactual Reasoning in Large Language Models and Implications for AI in Education [2.872215065231376]
This paper introduces MalAlgoQA, a dataset designed to evaluate the counterfactual reasoning capabilities of Large Language Models.
At the heart of MalAlgoQA are malgorithms'' - rationales behind incorrect answer choices that represent flawed yet logically coherent reasoning paths.
arXiv Detail & Related papers (2024-07-01T03:39:13Z) - F-LMM: Grounding Frozen Large Multimodal Models [53.8059045627934]
We present F-LMM -- grounding frozen off-the-shelf LMMs in human-AI conversations.
Using only a few trainable CNN layers, we can translate word-pixel attention weights to mask logits.
Our F-LMM neither learns special segmentation tokens nor utilises high-quality grounded instruction-tuning data.
arXiv Detail & Related papers (2024-06-09T15:14:26Z) - Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models [84.78457918843165]
Unsolvable Problem Detection (UPD) is a novel task to evaluate the robust understanding capability of Large Multimodal Models (LMMs)
UPD assesses the LMM's ability to withhold answers when encountering unsolvable problems of multiple-choice question answering.
Our experiments reveal that even most LMMs, which demonstrate adequate performance on existing benchmarks, struggle significantly with MM-UPD.
arXiv Detail & Related papers (2024-03-29T17:59:53Z) - Lumen: Unleashing Versatile Vision-Centric Capabilities of Large Multimodal Models [87.47400128150032]
We propose a novel LMM architecture named Lumen, a Large multimodal model with versatile vision-centric capability enhancement.
Lumen first promotes fine-grained vision-language concept alignment.
Then the task-specific decoding is carried out by flexibly routing the shared representation to lightweight task decoders.
arXiv Detail & Related papers (2024-03-12T04:13:45Z) - Myriad: Large Multimodal Model by Applying Vision Experts for Industrial Anomaly Detection [86.24898024621008]
We present a novel large multimodal model applying vision experts for industrial anomaly detection(abbreviated to Myriad)
We utilize the anomaly map generated by the vision experts as guidance for LMMs, such that the vision model is guided to pay more attention to anomalous regions.
Our proposed method not only performs favorably against state-of-the-art methods, but also inherits the flexibility and instruction-following ability of LMMs in the field of IAD.
arXiv Detail & Related papers (2023-10-29T16:49:45Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z) - MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities [153.37868034779385]
We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks.<n>Recent LMMs have shown various intriguing abilities, such as solving math problems written on the blackboard, reasoning about events and celebrities in news images, and explaining visual jokes.
arXiv Detail & Related papers (2023-08-04T17:59:47Z) - Learning from Mistakes via Cooperative Study Assistant for Large
Language Models [17.318591492264023]
Large language models (LLMs) have demonstrated their potential to refine their generation based on their own feedback.
We propose Study Assistant for Large LAnguage Model (SALAM), a novel framework with an auxiliary agent to assist the main LLM in learning from mistakes.
arXiv Detail & Related papers (2023-05-23T08:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.