Related papers: Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game

Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game

URL: http://arxiv.org/abs/2507.22426v1
Date: Wed, 30 Jul 2025 07:12:06 GMT
Title: Multimodal Late Fusion Model for Problem-Solving Strategy Classification in a Machine Learning Game
Authors: Clemens Witt, Thiemo Leonhardt, Nadine Bergner, Mareen Grillenberger,
Abstract summary: This paper proposes a multimodal late fusion model that integrates visual data and structured in-game action sequences to classify students' problem-solving strategies.<n>Results highlight the potential of multimodal ML for strategy-sensitive assessment and adaptive support in interactive learning contexts.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Machine learning models are widely used to support stealth assessment in digital learning environments. Existing approaches typically rely on abstracted gameplay log data, which may overlook subtle behavioral cues linked to learners' cognitive strategies. This paper proposes a multimodal late fusion model that integrates screencast-based visual data and structured in-game action sequences to classify students' problem-solving strategies. In a pilot study with secondary school students (N=149) playing a multitouch educational game, the fusion model outperformed unimodal baseline models, increasing classification accuracy by over 15%. Results highlight the potential of multimodal ML for strategy-sensitive assessment and adaptive support in interactive learning contexts.

Related papers

CLARGA: Multimodal Graph Representation Learning over Arbitrary Sets of Modalities [0.0]
We introduce CLARGA, a general-purpose multimodal fusion architecture for representation learning.<n>Given a supervised dataset, CLARGA can be applied to virtually any machine learning task.<n>We demonstrate CLARGA's effectiveness in diverse multimodal representation learning tasks across 7 datasets.
arXiv Detail & Related papers (2025-12-10T14:06:48Z)
Partially Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation [53.723234136550055]
We term the new learning paradigm as Partially Supervised Unpaired Multi-Modal Learning (PSUMML)<n>We propose a novel Decomposed partial class adaptation with snapshot Ensembled Self-Training (DEST) framework for it.<n>Our framework consists of a compact segmentation network with modality specific normalization layers for learning with partially labeled unpaired multi-modal data.
arXiv Detail & Related papers (2025-03-07T07:22:42Z)
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning [7.412307614007383]
Multimodal learning models are designed to bridge different modalities, such as images and text, by learning a shared representation space.<n>These models often exhibit a modality gap, where different modalities occupy distinct regions within the shared representation space.<n>We identify the critical roles of mismatched data pairs and a learnable temperature parameter in causing and perpetuating the modality gap during training.
arXiv Detail & Related papers (2024-12-10T20:36:49Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities [89.40778301238642]
Model merging is an efficient empowerment technique in the machine learning community. There is a significant gap in the literature regarding a systematic and thorough review of these techniques.
arXiv Detail & Related papers (2024-08-14T16:58:48Z)
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing? [37.73329106465031]
We propose a text-to-image framework GTI-MM to enhance the data efficiency and model robustness against missing visual modality. Our findings reveal that synthetic images benefit training data efficiency with visual data missing in training and improve model robustness with visual data missing involving training and testing.
arXiv Detail & Related papers (2024-02-14T09:21:00Z)
Improving Discriminative Multi-Modal Learning with Large-Scale Pre-Trained Models [51.5543321122664]
This paper investigates how to better leverage large-scale pre-trained uni-modal models to enhance discriminative multi-modal learning. We introduce Multi-Modal Low-Rank Adaptation learning (MMLoRA)
arXiv Detail & Related papers (2023-10-08T15:01:54Z)
Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences. We pose the problem of unseen modality interaction and introduce a first solution. It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z)
Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective. Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
Cross-modal Learning for Multi-modal Video Categorization [24.61762520189921]
Multi-modal machine learning (ML) models can process data in multiple modalities. In this paper, we focus on the problem of video categorization using a multi-modal ML technique. We show how our proposed multi-modal video categorization models with cross-modal learning out-perform strong state-of-the-art baseline models.
arXiv Detail & Related papers (2020-03-07T03:21:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.