Related papers: REVELIO -- Universal Multimodal Task Load Estimation for Cross-Domain Generalization

REVELIO -- Universal Multimodal Task Load Estimation for Cross-Domain Generalization

URL: http://arxiv.org/abs/2509.01642v1
Date: Mon, 01 Sep 2025 17:36:09 GMT
Title: REVELIO -- Universal Multimodal Task Load Estimation for Cross-Domain Generalization
Authors: Maximilian P. Oppelt, Andreas Foltyn, Nadine R. Lang-Richter, Bjoern M. Eskofier,
Abstract summary: This paper introduces a new multimodal dataset that extends established cognitive load detection benchmarks with a real-world gaming application.<n>Task load annotations are derived from objective performance, subjective NASA-TLX ratings, and task-level design.<n>State-of-the-art end-to-end model, including xLSTM, ConvNeXt, and Transformer architectures are systematically trained and evaluated.
Score: 2.689067085628911
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Task load detection is essential for optimizing human performance across diverse applications, yet current models often lack generalizability beyond narrow experimental domains. While prior research has focused on individual tasks and limited modalities, there remains a gap in evaluating model robustness and transferability in real-world scenarios. This paper addresses these limitations by introducing a new multimodal dataset that extends established cognitive load detection benchmarks with a real-world gaming application, using the $n$-back test as a scientific foundation. Task load annotations are derived from objective performance, subjective NASA-TLX ratings, and task-level design, enabling a comprehensive evaluation framework. State-of-the-art end-to-end model, including xLSTM, ConvNeXt, and Transformer architectures are systematically trained and evaluated on multiple modalities and application domains to assess their predictive performance and cross-domain generalization. Results demonstrate that multimodal approaches consistently outperform unimodal baselines, with specific modalities and model architectures showing varying impact depending on the application subset. Importantly, models trained on one domain exhibit reduced performance when transferred to novel applications, underscoring remaining challenges for universal cognitive load estimation. These findings provide robust baselines and actionable insights for developing more generalizable cognitive load detection systems, advancing both research and practical implementation in human-computer interaction and adaptive systems.

Related papers

VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation [61.82502719679122]
We introduce VLNVerse, a benchmark for Versatile, Embodied, Realistic Simulation, and Evaluation.<n>VLNVerse redefines VLN as a scalable, full-stack embodied AI problem.<n>We propose a novel unified multi-task model capable of addressing all tasks within the benchmark.
arXiv Detail & Related papers (2025-12-22T04:27:26Z)
SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z)
MULTIBENCH++: A Unified and Comprehensive Multimodal Fusion Benchmarking Across Specialized Domains [35.511656323075506]
We have developed a large-scale, domain-adaptive benchmark for multimodal evaluation.<n>This benchmark integrates over 30 datasets, encompassing 15 modalities and 20 predictive tasks.<n>We have also developed an open-source, unified, and automated evaluation pipeline.
arXiv Detail & Related papers (2025-11-09T16:37:09Z)
EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence [17.644658293987955]
Embodied AI agents are capable of robust spatial perception, effective task planning, and adaptive execution in physical environments.<n>Current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations.<n>We propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes.
arXiv Detail & Related papers (2025-10-23T14:05:55Z)
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
FoMEMO: Towards Foundation Models for Expensive Multi-objective Optimization [19.69959362934787]
We propose a new paradigm named FoMEMO, which enables the establishment of a foundation model conditioned on any domain trajectory and user preference.<n>Rather than accessing extensive domain experiments in the real world, we demonstrate that pre-training the foundation model with a diverse set of hundreds of millions of synthetic data can lead to superior adaptability to unknown problems.
arXiv Detail & Related papers (2025-09-03T12:00:24Z)
Deploying Geospatial Foundation Models in the Real World: Lessons from WorldCereal [25.756741188074862]
This paper presents a structured approach to integrate geospatial foundation models into operational mapping systems.<n>Our protocol has three key steps: defining application requirements, adapting the model to domain-specific data and conducting rigorous empirical testing.<n>Results highlight the model's strong spatial and temporal generalization capabilities.
arXiv Detail & Related papers (2025-07-16T15:10:32Z)
PALATE: Peculiar Application of the Law of Total Expectation to Enhance the Evaluation of Deep Generative Models [0.5499796332553708]
Deep generative models (DGMs) have caused a paradigm shift in the field of machine learning.<n>A comprehensive evaluation of these models that accounts for the trichotomy between fidelity, diversity, and novelty in generated samples remains a formidable challenge.<n>We propose PALATE, a novel enhancement to the evaluation of DGMs that addresses limitations of existing metrics.
arXiv Detail & Related papers (2025-03-24T09:06:45Z)
Testing Generalizability in Causal Inference [3.547529079746247]
No formal procedure exists for statistically evaluating generalizability in machine learning algorithms.<n>We propose a systematic framework for statistically evaluating the generalizability of high-dimensional causal inference models.
arXiv Detail & Related papers (2024-11-05T11:44:00Z)
IT$^3$: Idempotent Test-Time Training [95.78053599609044]
Deep learning models often struggle when deployed in real-world settings due to distribution shifts between training and test data.<n>We present Idempotent Test-Time Training (IT$3$), a novel approach that enables on-the-fly adaptation to distribution shifts using only the current test instance.<n>Our results suggest that idempotence provides a universal principle for test-time adaptation that generalizes across domains and architectures.
arXiv Detail & Related papers (2024-10-05T15:39:51Z)
Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks [50.75902473813379]
This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models. The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes.
arXiv Detail & Related papers (2024-07-04T14:36:49Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
An Information-Theoretic Approach for Estimating Scenario Generalization in Crowd Motion Prediction [27.10815774845461]
We propose a novel scoring method, which characterizes generalization of models trained on source crowd scenarios and applied to target crowd scenarios. The Interaction component aims to characterize the difficulty of scenario domains, while the diversity of a scenario domain is captured in the Diversity score. Our experimental results validate the efficacy of the proposed method on several simulated and real-world (source,target) generalization tasks.
arXiv Detail & Related papers (2022-11-02T01:39:30Z)
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks [73.63892022944198]
We present a generic perception architecture named Uni-Perceiver. It processes a variety of modalities and tasks with unified modeling and shared parameters. Results show that our pre-trained model without any tuning can achieve reasonable performance even on novel tasks.
arXiv Detail & Related papers (2021-12-02T18:59:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.