Related papers: MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering

MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering

URL: http://arxiv.org/abs/2506.15298v1
Date: Wed, 18 Jun 2025 09:29:51 GMT
Title: MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering
Authors: Xinqi Fan, Jingting Li, John See, Moi Hoon Yap, Wen-Huang Cheng, Xiaobai Li, Xiaopeng Hong, Su-Jing Wang, Adrian K. Davision,
Abstract summary: Facial micro-expressions (MEs) are involuntary movements of the face that occur spontaneously when a person experiences an emotion.<n>In recent years, substantial advancements have been made in the areas of ME recognition, spotting, and generation.<n>The ME grand challenge (MEGC) 2025 introduces two tasks that reflect these evolving research directions.
Score: 55.30507585676142
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Facial micro-expressions (MEs) are involuntary movements of the face that occur spontaneously when a person experiences an emotion but attempts to suppress or repress the facial expression, typically found in a high-stakes environment. In recent years, substantial advancements have been made in the areas of ME recognition, spotting, and generation. However, conventional approaches that treat spotting and recognition as separate tasks are suboptimal, particularly for analyzing long-duration videos in realistic settings. Concurrently, the emergence of multimodal large language models (MLLMs) and large vision-language models (LVLMs) offers promising new avenues for enhancing ME analysis through their powerful multimodal reasoning capabilities. The ME grand challenge (MEGC) 2025 introduces two tasks that reflect these evolving research directions: (1) ME spot-then-recognize (ME-STR), which integrates ME spotting and subsequent recognition in a unified sequential pipeline; and (2) ME visual question answering (ME-VQA), which explores ME understanding through visual question answering, leveraging MLLMs or LVLMs to address diverse question types related to MEs. All participating algorithms are required to run on this test set and submit their results on a leaderboard. More details are available at https://megc2025.github.io.

Related papers

MELLM: Exploring LLM-Powered Micro-Expression Understanding Enhanced by Subtle Motion Perception [47.80768014770871]
We propose a novel Micro-Expression Large Language Model (MELLM)<n>It incorporates a subtle facial motion perception strategy with the strong inference capabilities of MLLMs.<n>Our model exhibits superior robustness and generalization capabilities in micro-expression understanding (MEU)
arXiv Detail & Related papers (2025-05-11T15:08:23Z)
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research [57.61445960384384]
MicroVQA consists of 1,042 multiple-choice questions (MCQs) curated by biology experts across diverse microscopy modalities.<n> Benchmarking on state-of-the-art MLLMs reveal a peak performance of 53%.<n>Expert analysis of chain-of-thought responses shows perception errors are the most frequent, followed by knowledge errors and then overgeneralization errors.
arXiv Detail & Related papers (2025-03-17T17:33:10Z)
EmoVerse: Exploring Multimodal Large Language Models for Sentiment and Emotion Understanding [5.3848462080869215]
We introduce Emotion Universe (EmoVerse), an MLLM designed to handle a broad spectrum of sentiment and emotion-related tasks.<n>EmoVerse is capable of deeply analyzing the underlying causes of emotional states.<n>We also introduce the Affective Multitask (AMT) dataset.
arXiv Detail & Related papers (2024-12-11T02:55:00Z)
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning [27.790079451103065]
We propose a novel MLLM, named EMO-LLaMA, which incorporates facial priors from a pretrained facial analysis network to enhance human facial information. EMO-LLaMA achieves SOTA-comparable or competitive results across both static and dynamic FER datasets.
arXiv Detail & Related papers (2024-08-21T08:28:40Z)
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues [0.0]
We propose a time-sensitive Multimodal Large Language Model (MLLM) aimed at directing attention to the local facial micro-expression dynamics. Our model incorporates two key architectural contributions: (1) a global-local attention visual encoder that integrates global frame-level timestamp-bound image features with local facial features of temporal dynamics of micro-expressions; and (2) an utterance-aware video Q-Former that captures multi-scale and contextual dependencies by generating visual token sequences for each utterance segment and for the entire video then combining them.
arXiv Detail & Related papers (2024-07-23T15:05:55Z)
Tell Me Where You Are: Multimodal LLMs Meet Place Recognition [11.421492098416538]
We introduce multimodal large language models (MLLMs) to visual place recognition (VPR) Our key design is to use vision-based retrieval to propose several candidates and then leverage language-based reasoning to carefully inspect each candidate for a final decision. Our results on three datasets demonstrate that integrating the general-purpose visual features from VFMs with the reasoning capabilities of MLLMs already provides an effective place recognition solution.
arXiv Detail & Related papers (2024-06-25T12:59:46Z)
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs [50.77984109941538]
Our research reveals that the visual capabilities in recent multimodal LLMs still exhibit systematic shortcomings. We identify ''CLIP-blind pairs'' - images that CLIP perceives as similar despite their clear visual differences. We evaluate various CLIP-based vision-and-language models and found a notable correlation between visual patterns that challenge CLIP models and those problematic for multimodal LLMs.
arXiv Detail & Related papers (2024-01-11T18:58:36Z)
Video-based Facial Micro-Expression Analysis: A Survey of Datasets, Features and Algorithms [52.58031087639394]
micro-expressions are involuntary and transient facial expressions. They can provide important information in a broad range of applications such as lie detection, criminal detection, etc. Since micro-expressions are transient and of low intensity, their detection and recognition is difficult and relies heavily on expert experiences.
arXiv Detail & Related papers (2022-01-30T05:14:13Z)
Micro-expression spotting: A new benchmark [74.69928316848866]
Micro-expressions (MEs) are brief and involuntary facial expressions that occur when people are trying to hide their true feelings or conceal their emotions. In the computer vision field, the study of MEs can be divided into two main tasks, spotting and recognition. This paper introduces an extension of the SMIC-E database, namely the SMIC-E-Long database, which is a new challenging benchmark for ME spotting.
arXiv Detail & Related papers (2020-07-24T09:18:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.