Related papers: MatQnA: A Benchmark Dataset for Multi-modal Large Language Models in Materials Characterization and Analysis

MatQnA: A Benchmark Dataset for Multi-modal Large Language Models in Materials Characterization and Analysis

URL: http://arxiv.org/abs/2509.11335v1
Date: Sun, 14 Sep 2025 16:23:48 GMT
Title: MatQnA: A Benchmark Dataset for Multi-modal Large Language Models in Materials Characterization and Analysis
Authors: Yonghao Weng, Liqiang Gao, Linwu Zhu, Jian Huang,
Abstract summary: MatQnA is the first multi-modal benchmark dataset specifically designed for material characterization techniques.<n>We employ a hybrid approach combining LLMs with human-in-the-loop validation to construct high-quality question-answer pairs.<n>Preliminary evaluation results show that the most advanced multi-modal AI models have already achieved nearly 90% accuracy on objective questions.
Score: 2.184404734602291
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, large language models (LLMs) have achieved remarkable breakthroughs in general domains such as programming and writing, and have demonstrated strong potential in various scientific research scenarios. However, the capabilities of AI models in the highly specialized field of materials characterization and analysis have not yet been systematically or sufficiently validated. To address this gap, we present MatQnA, the first multi-modal benchmark dataset specifically designed for material characterization techniques. MatQnA includes ten mainstream characterization methods, such as X-ray Photoelectron Spectroscopy (XPS), X-ray Diffraction (XRD), Scanning Electron Microscopy (SEM), Transmission Electron Microscopy (TEM), etc. We employ a hybrid approach combining LLMs with human-in-the-loop validation to construct high-quality question-answer pairs, integrating both multiple-choice and subjective questions. Our preliminary evaluation results show that the most advanced multi-modal AI models (e.g., GPT-4.1, Claude 4, Gemini 2.5, and Doubao Vision Pro 32K) have already achieved nearly 90% accuracy on objective questions in materials data interpretation and analysis tasks, demonstrating strong potential for applications in materials characterization and analysis. The MatQnA dataset is publicly available at https://huggingface.co/datasets/richardhzgg/matQnA.

Related papers

TSAQA: Time Series Analysis Question And Answering Benchmark [85.35545785252309]
Time series data are integral to critical applications across domains such as finance, healthcare, transportation, and environmental science.<n>We introduce TSAQA, a novel unified benchmark designed to broaden task coverage and evaluate diverse temporal analysis capabilities.
arXiv Detail & Related papers (2026-01-30T17:28:56Z)
MicroVQA++: High-Quality Microscopy Reasoning Dataset with Weakly Supervised Graphs for Multimodal Large Language Model [28.472848113791162]
MicroVQA++ is a three-stage, large-scale and high-quality microscopy VQA corpus.<n>It bootstraps supervision from expert-validated figure-caption pairs sourced from peer-reviewed articles.<n>HiCQA-Graph is a novel heterogeneous graph over images, captions, and QAs that fuses NLI-based textual entailment, CLIP-based vision-language alignment, and agent signals.
arXiv Detail & Related papers (2025-11-14T15:35:43Z)
OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive [50.468138755368805]
Opioid crisis represents a significant moment in public health.<n>Data and documents disclosed in the UCSF-JHU Opioid Industry Documents Archive (OIDA)<n>In this paper, we tackle this challenge by organizing the original dataset according to document attributes.
arXiv Detail & Related papers (2025-11-13T03:27:32Z)
UniEM-3M: A Universal Electron Micrograph Dataset for Microstructural Segmentation and Generation [19.67541048907923]
We introduce UniEM-3M, the first large-scale and multimodal EM dataset for instance-level understanding.<n>It comprises 5,091 high-resolution EMs, about 3 million instance segmentation labels, and image-level attribute-disentangled textual descriptions.<n>A text-to-image diffusion model trained on the entire collection serves as both a powerful data augmentation tool and a proxy for the complete data distribution.
arXiv Detail & Related papers (2025-08-22T09:20:00Z)
Zero-shot Autonomous Microscopy for Scalable and Intelligent Characterization of 2D Materials [41.856704526703595]
characterization of atomic-scale materials traditionally requires human experts with months to years of specialized training.<n>This bottleneck drives demand for fully autonomous experimentation systems capable of comprehending research objectives without requiring large training datasets.<n>We present ATOMIC, an end-to-end framework that integrates foundation models to enable fully autonomous, zero-shot characterization of 2D materials.
arXiv Detail & Related papers (2025-04-14T14:49:45Z)
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment [65.3860007085689]
M3-AGIQA is a comprehensive framework that enables more human-aligned, holistic evaluation of AI-generated images.<n>By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores.
arXiv Detail & Related papers (2025-02-21T03:05:45Z)
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data [71.352883755806]
Multimodal embedding models have gained significant attention for their ability to map data from different modalities, such as text and images, into a unified representation space.<n>However, the limited labeled multimodal data often hinders embedding performance.<n>Recent approaches have leveraged data synthesis to address this problem, yet the quality of synthetic data remains a critical bottleneck.
arXiv Detail & Related papers (2025-02-12T15:03:33Z)
Personalized Multimodal Large Language Models: A Survey [127.9521218125761]
Multimodal Large Language Models (MLLMs) have become increasingly important due to their state-of-the-art performance and ability to integrate multiple data modalities.<n>This paper presents a comprehensive survey on personalized multimodal large language models, focusing on their architecture, training methods, and applications.
arXiv Detail & Related papers (2024-12-03T03:59:03Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.<n>We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.<n>Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis [4.539569292151314]
Large Language Models (LLMs) enable human-bot collaboration in Software Engineering (SE)<n>This study is to design and develop an LLM-based multi-agent system that synergizes human decision support with AI to automate various qualitative data analysis approaches.
arXiv Detail & Related papers (2024-02-02T13:10:46Z)
RethinkingTMSC: An Empirical Study for Target-Oriented Multimodal Sentiment Classification [70.9087014537896]
Target-oriented Multimodal Sentiment Classification (TMSC) has gained significant attention among scholars. To investigate the causes of this problem, we perform extensive empirical evaluation and in-depth analysis of the datasets.
arXiv Detail & Related papers (2023-10-14T14:52:37Z)
Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset [29.866478682797513]
We provide an in-depth analysis of emrQA, the first large-scale dataset for question answering (QA) based on clinical notes. We find that (i) emrQA answers are often incomplete, and (ii) emrQA questions are often answerable without using domain knowledge.
arXiv Detail & Related papers (2020-05-01T19:07:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.