Related papers: UniEM-3M: A Universal Electron Micrograph Dataset for Microstructural Segmentation and Generation

UniEM-3M: A Universal Electron Micrograph Dataset for Microstructural Segmentation and Generation

URL: http://arxiv.org/abs/2508.16239v1
Date: Fri, 22 Aug 2025 09:20:00 GMT
Title: UniEM-3M: A Universal Electron Micrograph Dataset for Microstructural Segmentation and Generation
Authors: Nan wang, Zhiyi Xia, Yiming Li, Shi Tang, Zuxin Fan, Xi Fang, Haoyi Tao, Xiaochen Cai, Guolin Ke, Linfeng Zhang, Yanhui Hong,
Abstract summary: We introduce UniEM-3M, the first large-scale and multimodal EM dataset for instance-level understanding.<n>It comprises 5,091 high-resolution EMs, about 3 million instance segmentation labels, and image-level attribute-disentangled textual descriptions.<n>A text-to-image diffusion model trained on the entire collection serves as both a powerful data augmentation tool and a proxy for the complete data distribution.
Score: 19.67541048907923
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantitative microstructural characterization is fundamental to materials science, where electron micrograph (EM) provides indispensable high-resolution insights. However, progress in deep learning-based EM characterization has been hampered by the scarcity of large-scale, diverse, and expert-annotated datasets, due to acquisition costs, privacy concerns, and annotation complexity. To address this issue, we introduce UniEM-3M, the first large-scale and multimodal EM dataset for instance-level understanding. It comprises 5,091 high-resolution EMs, about 3 million instance segmentation labels, and image-level attribute-disentangled textual descriptions, a subset of which will be made publicly available. Furthermore, we are also releasing a text-to-image diffusion model trained on the entire collection to serve as both a powerful data augmentation tool and a proxy for the complete data distribution. To establish a rigorous benchmark, we evaluate various representative instance segmentation methods on the complete UniEM-3M and present UniEM-Net as a strong baseline model. Quantitative experiments demonstrate that this flow-based model outperforms other advanced methods on this challenging benchmark. Our multifaceted release of a partial dataset, a generative model, and a comprehensive benchmark -- available at huggingface -- will significantly accelerate progress in automated materials analysis.

Related papers

SAM 3D Body: Robust Full-Body Human Mesh Recovery [65.0108906331903]
We introduce SAM 3D Body (3DB), a promptable model for single-image full-body 3D human mesh recovery (HMR)<n>3DB estimates the human pose of the body, feet, and hands.<n>It is the first model to use a new parametric mesh representation, Momentum Human Rig (MHR), which decouples skeletal structure and surface shape.
arXiv Detail & Related papers (2026-02-17T20:26:37Z)
Large-scale EM Benchmark for Multi-Organelle Instance Segmentation in the Wild [8.670858548670742]
We develop a benchmark for multi-organelle instance segmentation, comprising over 100,000 2D EM images across variety cell types and five organelle classes that capture real-world variability.<n>Our results show several limitations: current models struggle to generalize across heterogeneous EM data and perform poorly on organelles with global, distributed morphologies.<n>These findings underscore the fundamental mismatch between local-context models and the challenge of modeling long-range structural continuity in the presence of real-world variability.
arXiv Detail & Related papers (2026-01-18T16:09:27Z)
Reconstruction-Driven Multimodal Representation Learning for Automated Media Understanding [0.1411701037241356]
We propose a Multimodal Autoencoder that learns unified representations across text, audio, and visual data.<n>We demonstrate significant improvements in clustering and alignment metrics compared to linear baselines.<n>Results highlight the potential of reconstruction-driven multimodal learning to enhance automation, searchability, and content management efficiency in modern broadcast.
arXiv Detail & Related papers (2025-11-17T19:13:51Z)
Beyond Atomic Geometry Representations in Materials Science: A Human-in-the-Loop Multimodal Framework [2.172419551358714]
MultiCrystalSpectrumSet (MCS-Set) is a curated framework that expands materials datasets by integrating atomic structures with 2D projections and structured textual annotations.<n>MCS-Set enables two key tasks: (1) multimodal property and summary prediction, and (2) constrained crystal generation with partial cluster supervision.
arXiv Detail & Related papers (2025-05-30T23:18:42Z)
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment [65.3860007085689]
M3-AGIQA is a comprehensive framework that enables more human-aligned, holistic evaluation of AI-generated images.<n>By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores.
arXiv Detail & Related papers (2025-02-21T03:05:45Z)
MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z)
Revealing the Evolution of Order in Materials Microstructures Using Multi-Modal Computer Vision [4.6481041987538365]
Development of high-performance materials for microelectronics depends on our ability to describe and direct property-defining microstructural order. Here, we demonstrate a multi-modal machine learning (ML) approach to describe order from electron microscopy analysis of the complex oxide La$_1-x$Sr$_x$FeO$_3$. We observe distinct differences in the performance of uni- and multi-modal models, from which we draw general lessons in describing crystal order using computer vision.
arXiv Detail & Related papers (2024-11-15T02:44:32Z)
EMMA: Efficient Visual Alignment in Multi-Modal LLMs [56.03417732498859]
EMMA is a lightweight cross-modality module designed to efficiently fuse visual and textual encodings.<n>EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations.
arXiv Detail & Related papers (2024-10-02T23:00:31Z)
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models [49.439311430360284]
We introduce a novel data synthesis method inspired by contrastive learning and image difference captioning.<n>Our key idea involves challenging the model to discern both matching and distinct elements.<n>We leverage this generated dataset to fine-tune state-of-the-art (SOTA) MLLMs.
arXiv Detail & Related papers (2024-08-08T17:10:16Z)
MatSAM: Efficient Extraction of Microstructures of Materials via Visual Large Model [11.130574172301365]
Segment Anything Model (SAM) is a large visual model with powerful deep feature representation and zero-shot generalization capabilities. In this paper, we propose MatSAM, a general and efficient microstructure extraction solution based on SAM. A simple yet effective point-based prompt generation strategy is designed, grounded on the distribution and shape of microstructures.
arXiv Detail & Related papers (2024-01-11T03:18:18Z)
Learning Multiscale Consistency for Self-supervised Electron Microscopy Instance Segmentation [48.267001230607306]
We propose a pretraining framework that enhances multiscale consistency in EM volumes. Our approach leverages a Siamese network architecture, integrating strong and weak data augmentations. It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
arXiv Detail & Related papers (2023-08-19T05:49:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.