Personalized Multimodal Feedback Generation in Education
- URL: http://arxiv.org/abs/2011.00192v1
- Date: Sat, 31 Oct 2020 05:26:49 GMT
- Title: Personalized Multimodal Feedback Generation in Education
- Authors: Haochen Liu, Zitao Liu, Zhongqin Wu, Jiliang Tang
- Abstract summary: The automatic evaluation for school assignments is an important application of AI in the education field.
We propose a novel Personalized Multimodal Feedback Generation Network (PMFGN) armed with a modality gate mechanism and a personalized bias mechanism.
Our model significantly outperforms several baselines by generating more accurate and diverse feedback.
- Score: 50.95346877192268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The automatic evaluation for school assignments is an important application
of AI in the education field. In this work, we focus on the task of
personalized multimodal feedback generation, which aims to generate
personalized feedback for various teachers to evaluate students' assignments
involving multimodal inputs such as images, audios, and texts. This task
involves the representation and fusion of multimodal information and natural
language generation, which presents the challenges from three aspects: 1) how
to encode and integrate multimodal inputs; 2) how to generate feedback specific
to each modality; and 3) how to realize personalized feedback generation. In
this paper, we propose a novel Personalized Multimodal Feedback Generation
Network (PMFGN) armed with a modality gate mechanism and a personalized bias
mechanism to address these challenges. The extensive experiments on real-world
K-12 education data show that our model significantly outperforms several
baselines by generating more accurate and diverse feedback. In addition,
detailed ablation experiments are conducted to deepen our understanding of the
proposed framework.
Related papers
- HEMM: Holistic Evaluation of Multimodal Foundation Models [91.60364024897653]
Multimodal foundation models can holistically process text alongside images, video, audio, and other sensory modalities.
It is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains.
arXiv Detail & Related papers (2024-07-03T18:00:48Z) - Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression Recognition [6.995226697189459]
We employ a multi-modal self-supervised learning method for facial expression recognition from in-the-wild video data.
Our results generally show that multi-modal self-supervision tasks offer large performance gains for challenging tasks.
We release our pre-trained models as well as source code publicly.
arXiv Detail & Related papers (2024-04-16T20:51:36Z) - Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond [87.1712108247199]
Our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP)
We develop a generic and personalization generative framework, that can handle a wide range of personalized needs.
Our methodology enhances the capabilities of foundational language models for personalized tasks.
arXiv Detail & Related papers (2024-03-15T20:21:31Z) - UniMS-RAG: A Unified Multi-source Retrieval-Augmented Generation for Personalized Dialogue Systems [43.266153244137215]
Large Language Models (LLMs) has shown exceptional capabilities in many natual language understanding and generation tasks.
We decompose the use of multiple sources in generating personalized response into three sub-tasks: Knowledge Source Selection, Knowledge Retrieval, and Response Generation.
We propose a novel Unified Multi-Source Retrieval-Augmented Generation system (UniMS-RAG)
arXiv Detail & Related papers (2024-01-24T06:50:20Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - Generative Multimodal Models are In-Context Learners [60.50927925426832]
We introduce Emu2, a generative multimodal model with 37 billion parameters, trained on large-scale multimodal sequences.
Emu2 exhibits strong multimodal in-context learning abilities, even emerging to solve tasks that require on-the-fly reasoning.
arXiv Detail & Related papers (2023-12-20T18:59:58Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - MIMIC-IT: Multi-Modal In-Context Instruction Tuning [44.879418596312554]
We present a dataset comprising 2.8 million multimodal instruction-response pairs, with 2.2 million unique instructions derived from images and videos.
Using the MIMIC-IT dataset, it has been observed that Otter demonstrates remarkable proficiency in multi-modal perception, reasoning, and in-context learning.
We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model.
arXiv Detail & Related papers (2023-06-08T17:59:56Z) - Accountable Textual-Visual Chat Learns to Reject Human Instructions in
Image Re-creation [26.933683814025475]
We introduce two novel multimodal datasets: the synthetic CLEVR-ATVC dataset (620K) and the manually pictured Fruit-ATVC dataset (50K).
These datasets incorporate both visual and text-based inputs and outputs.
To facilitate the accountability of multimodal systems in rejecting human requests, similar to language-based ChatGPT conversations, we introduce specific rules as supervisory signals within the datasets.
arXiv Detail & Related papers (2023-03-10T15:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.