CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring
- URL: http://arxiv.org/abs/2505.13965v1
- Date: Tue, 20 May 2025 06:05:56 GMT
- Title: CAFES: A Collaborative Multi-Agent Framework for Multi-Granular Multimodal Essay Scoring
- Authors: Jiamin Su, Yibo Yan, Zhuoran Gao, Han Zhang, Xiang Liu, Xuming Hu,
- Abstract summary: We introduce CAFES, the first collaborative multi-agent framework specifically designed for AES.<n>It orchestrates three specialized agents: an Initial Scorer for rapid, trait-specific evaluations; a Feedback Pool Manager to aggregate detailed, evidence-grounded strengths; and a Reflective Scorer that iteratively refines scores based on this feedback to enhance human alignment.
- Score: 15.197083495600998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated Essay Scoring (AES) is crucial for modern education, particularly with the increasing prevalence of multimodal assessments. However, traditional AES methods struggle with evaluation generalizability and multimodal perception, while even recent Multimodal Large Language Model (MLLM)-based approaches can produce hallucinated justifications and scores misaligned with human judgment. To address the limitations, we introduce CAFES, the first collaborative multi-agent framework specifically designed for AES. It orchestrates three specialized agents: an Initial Scorer for rapid, trait-specific evaluations; a Feedback Pool Manager to aggregate detailed, evidence-grounded strengths; and a Reflective Scorer that iteratively refines scores based on this feedback to enhance human alignment. Extensive experiments, using state-of-the-art MLLMs, achieve an average relative improvement of 21% in Quadratic Weighted Kappa (QWK) against ground truth, especially for grammatical and lexical diversity. Our proposed CAFES framework paves the way for an intelligent multimodal AES system. The code will be available upon acceptance.
Related papers
- CAF-I: A Collaborative Multi-Agent Framework for Enhanced Irony Detection with Large Language Models [10.551915512812107]
This paper introduces the Collaborative Agent Framework for Irony (CAF-I)<n>CAF-I employs specialized agents for Context, Semantics, and Rhetoric, which perform multidimensional analysis.<n> Experiments on benchmark datasets establish CAF-I's state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2025-06-10T04:05:06Z) - Advancing Multimodal Reasoning Capabilities of Multimodal Large Language Models via Visual Perception Reward [87.06604760273372]
We propose Perception-R1, which introduces a novel visual perception reward that explicitly encourages MLLMs to perceive the visual content accurately.<n>We show that Perception-R1 achieves state-of-the-art performance on most benchmarks using only 1,442 training data.
arXiv Detail & Related papers (2025-06-08T16:48:42Z) - On Path to Multimodal Generalist: General-Level and General-Bench [153.9720740167528]
This project introduces General-Level, an evaluation framework that defines 5-scale levels of MLLM performance and generality.<n>At the core of the framework is the concept of Synergy, which measures whether models maintain consistent capabilities across comprehension and generation.<n>The evaluation results that involve over 100 existing state-of-the-art MLLMs uncover the capability rankings of generalists.
arXiv Detail & Related papers (2025-05-07T17:59:32Z) - Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning [76.10639521319382]
We propose Symbolic-MoE, a symbolic, text-based, and gradient-free Mixture-of-Experts framework.<n>We show that Symbolic-MoE's instance-level expert selection improves performance by a large margin but -- when implemented naively -- can introduce a high computational overhead.
arXiv Detail & Related papers (2025-03-07T18:03:13Z) - Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing [43.75154489681047]
We propose a novel framework leveraging test-time scaling for Multi-Document Summarization (MDS)<n>Our approach employs prompt ensemble techniques to generate multiple candidate summaries using various prompts, then combines them with an aggregator to produce a refined summary.<n>To evaluate our method effectively, we also introduce two new LLM-based metrics: the Consistency-Aware Preference (CAP) score and LLM Atom-Content-Unit (LLM-ACU) score.
arXiv Detail & Related papers (2025-02-27T23:34:47Z) - EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models [19.271790170055375]
EssayJudge offers precise, context-rich evaluations without manual feature engineering, addressing longstanding AES limitations.<n>Our experiments with 18 representative MLLMs reveal gaps in AES performance compared to human evaluation, particularly in discourse-level traits.
arXiv Detail & Related papers (2025-02-17T15:31:59Z) - Progressive Multimodal Reasoning via Active Retrieval [64.74746997923967]
Multi-step multimodal reasoning tasks pose significant challenges for large language models (MLLMs)<n>We propose AR-MCTS, a universal framework designed to progressively improve the reasoning capabilities of MLLMs.<n>We show that AR-MCTS can optimize sampling diversity and accuracy, yielding reliable multimodal reasoning.
arXiv Detail & Related papers (2024-12-19T13:25:39Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.