Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework
- URL: http://arxiv.org/abs/2506.00302v2
- Date: Sat, 19 Jul 2025 19:25:44 GMT
- Title: Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework
- Authors: Can Polat, Erchin Serpedin, Mustafa Kurban, Hasan Kurban,
- Abstract summary: MultiCrystalSpectrumSet (MCS-Set) is a curated framework that expands materials datasets by integrating atomic structures with 2D projections and structured textual annotations.<n>MCS-Set enables two key tasks: (1) multimodal property and summary prediction, and (2) constrained crystal generation with partial cluster supervision.
- Score: 2.172419551358714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most materials science datasets are limited to atomic geometries (e.g., XYZ files), restricting their utility for multimodal learning and comprehensive data-centric analysis. These constraints have historically impeded the adoption of advanced machine learning techniques in the field. This work introduces MultiCrystalSpectrumSet (MCS-Set), a curated framework that expands materials datasets by integrating atomic structures with 2D projections and structured textual annotations, including lattice parameters and coordination metrics. MCS-Set enables two key tasks: (1) multimodal property and summary prediction, and (2) constrained crystal generation with partial cluster supervision. Leveraging a human-in-the-loop pipeline, MCS-Set combines domain expertise with standardized descriptors for high-quality annotation. Evaluations using state-of-the-art language and vision-language models reveal substantial modality-specific performance gaps and highlight the importance of annotation quality for generalization. MCS-Set offers a foundation for benchmarking multimodal models, advancing annotation practices, and promoting accessible, versatile materials science datasets. The dataset and implementations are available at https://github.com/KurbanIntelligenceLab/MultiCrystalSpectrumSet.
Related papers
- MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.<n>Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.<n>We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - UniMat: Unifying Materials Embeddings through Multi-modal Learning [0.0]
We evaluate techniques in multi-modal learning (alignment and fusion) in unifying some of the most important modalities in materials science.
We show that structure graph modality can be enhanced by aligning with XRD patterns.
We also show that aligning and fusing more experimentally accessible data formats, such as XRD patterns and compositions, can create more robust joint embeddings.
arXiv Detail & Related papers (2024-11-13T14:55:08Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.<n>We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.<n>Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Pink: Unveiling the Power of Referential Comprehension for Multi-modal
LLMs [49.88461345825586]
This paper proposes a new framework to enhance the fine-grained image understanding abilities of MLLMs.
We present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.
We show that our model exhibits a 5.2% accuracy improvement over Qwen-VL and surpasses the accuracy of Kosmos-2 by 24.7%.
arXiv Detail & Related papers (2023-10-01T05:53:15Z) - MatSciML: A Broad, Multi-Task Benchmark for Solid-State Materials
Modeling [7.142619575624596]
MatSci ML is a benchmark for modeling MATerials SCIence using Machine Learning (MatSci ML) methods.
MatSci ML provides a diverse set of materials systems and properties data for model training and evaluation.
In the multi-dataset learning setting, MatSci ML enables researchers to combine observations from multiple datasets to perform joint prediction of common properties.
arXiv Detail & Related papers (2023-09-12T03:08:37Z) - MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained
Semantic Classes and Hard Negative Entities [25.059177235004952]
We propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities.
A powerful multi-modal model MultiExpan is proposed which is pre-trained on four multimodal pre-training tasks.
The MESED dataset is the first multi-modal dataset for ESE with large-scale and elaborate manual calibration.
arXiv Detail & Related papers (2023-07-27T14:09:59Z) - Enhancing Human-like Multi-Modal Reasoning: A New Challenging Dataset
and Comprehensive Framework [51.44863255495668]
Multimodal reasoning is a critical component in the pursuit of artificial intelligence systems that exhibit human-like intelligence.
We present Multi-Modal Reasoning(COCO-MMR) dataset, a novel dataset that encompasses an extensive collection of open-ended questions.
We propose innovative techniques, including multi-hop cross-modal attention and sentence-level contrastive learning, to enhance the image and text encoders.
arXiv Detail & Related papers (2023-07-24T08:58:25Z) - Self-Supervised Multimodal Learning: A Survey [23.526389924804207]
Multimodal learning aims to understand and analyze information from multiple modalities.
The heavy dependence on data paired with expensive human annotations impedes scaling up models.
Given the availability of large-scale unannotated data in the wild, self-supervised learning has become an attractive strategy to alleviate the annotation bottleneck.
arXiv Detail & Related papers (2023-03-31T16:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.