Related papers: Linking heterogeneous microstructure informatics with expert characterization knowledge through customized and hybrid vision-language representations for industrial qualification

Linking heterogeneous microstructure informatics with expert characterization knowledge through customized and hybrid vision-language representations for industrial qualification

URL: http://arxiv.org/abs/2508.20243v1
Date: Wed, 27 Aug 2025 19:59:12 GMT
Title: Linking heterogeneous microstructure informatics with expert characterization knowledge through customized and hybrid vision-language representations for industrial qualification
Authors: Mutahar Safdar, Gentry Wood, Max Zimmermann, Guy Lamouche, Priti Wanjara, Yaoyao Fiona Zhao,
Abstract summary: This study introduces a novel framework that links microstructure informatics with a range of expert characterization knowledge.<n>By integrating deep semantic segmentation with pre-trained multi-modal models, we encode both visual microstructural data and textual expert assessments into shared representations.
Score: 3.2038915276197932
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Rapid and reliable qualification of advanced materials remains a bottleneck in industrial manufacturing, particularly for heterogeneous structures produced via non-conventional additive manufacturing processes. This study introduces a novel framework that links microstructure informatics with a range of expert characterization knowledge using customized and hybrid vision-language representations (VLRs). By integrating deep semantic segmentation with pre-trained multi-modal models (CLIP and FLAVA), we encode both visual microstructural data and textual expert assessments into shared representations. To overcome limitations in general-purpose embeddings, we develop a customized similarity-based representation that incorporates both positive and negative references from expert-annotated images and their associated textual descriptions. This allows zero-shot classification of previously unseen microstructures through a net similarity scoring approach. Validation on an additively manufactured metal matrix composite dataset demonstrates the framework's ability to distinguish between acceptable and defective samples across a range of characterization criteria. Comparative analysis reveals that FLAVA model offers higher visual sensitivity, while the CLIP model provides consistent alignment with the textual criteria. Z-score normalization adjusts raw unimodal and cross-modal similarity scores based on their local dataset-driven distributions, enabling more effective alignment and classification in the hybrid vision-language framework. The proposed method enhances traceability and interpretability in qualification pipelines by enabling human-in-the-loop decision-making without task-specific model retraining. By advancing semantic interoperability between raw data and expert knowledge, this work contributes toward scalable and domain-adaptable qualification strategies in engineering informatics.

Related papers

Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models [15.709482146201283]
A simple linear classifier trained on the frozen features of modern Vision Foundation Models establishes a new state-of-the-art.<n>We show that this baseline matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets.<n>We conclude by advocating for a paradigm shift in AI forensics, moving from overfitting on static benchmarks to harnessing the evolving world knowledge of foundation models for real-world reliability.
arXiv Detail & Related papers (2026-02-02T07:20:02Z)
Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation [1.155620308725562]
Large language models (LLMs) offer new opportunities for constructing knowledge graphs from unstructured clinical narratives.<n>Existing approaches often rely on structured inputs and lack robust validation of factual accuracy and semantic consistency.<n>We introduce an end-to-end framework for clinical KG construction using multi-agent prompting and a schema-constrained Retrieval-Augmented Generation strategy.
arXiv Detail & Related papers (2026-01-05T07:16:29Z)
Towards Explainable Job Title Matching: Leveraging Semantic Textual Relatedness and Knowledge Graphs [0.19116784879310025]
This study investigates semantic textual relatedness (STR) in the context of job title matching.<n>We introduce a self-supervised hybrid architecture that combines dense sentence embeddings with domain-specific Knowledge Graphs.<n>We show that fine-tuned SBERT models augmented with KGs produce consistent improvements in the high-STR region.
arXiv Detail & Related papers (2025-09-11T15:02:54Z)
Structural and Statistical Texture Knowledge Distillation and Learning for Segmentation [70.15341084443236]
We re-emphasize the low-level texture information in deep networks for semantic segmentation and related knowledge distillation tasks.<n>We propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation.<n>Specifically, Contourlet Decomposition Module (CDM) is introduced to decompose the low-level features.<n> Texture Intensity Equalization Module (TIEM) is designed to extract and enhance the statistical texture knowledge.
arXiv Detail & Related papers (2025-03-11T04:49:25Z)
Shortcut Learning Susceptibility in Vision Classifiers [3.004632712148892]
Shortcut learning is where machine learning models exploit spurious correlations in data instead of capturing meaningful features.<n>This phenomenon is prevalent across various machine learning applications, including vision, natural language processing, and speech recognition.<n>We systematically evaluate these architectures by introducing deliberate shortcuts into the dataset that are positionally correlated with class labels.
arXiv Detail & Related papers (2025-02-13T10:25:52Z)
Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.<n>CAP intervenes in model activations through constituent-based pooling at various model levels.<n>Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability.
arXiv Detail & Related papers (2024-10-16T18:10:50Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Leveraging vision-language models for fair facial attribute classification [19.93324644519412]
General-purpose vision-language model (VLM) is a rich knowledge source for common sensitive attributes. We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution. Experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines.
arXiv Detail & Related papers (2024-03-15T18:37:15Z)
Exploring Precision and Recall to assess the quality and diversity of LLMs [82.21278402856079]
We introduce a novel evaluation framework for Large Language Models (LLMs) such as textscLlama-2 and textscMistral. This approach allows for a nuanced assessment of the quality and diversity of generated text without the need for aligned corpora.
arXiv Detail & Related papers (2024-02-16T13:53:26Z)
Contextualization Distillation from Large Language Model for Knowledge Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks. Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments. Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z)
UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC) UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z)
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations. We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations. A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z)
Feature construction using explanations of individual predictions [0.0]
We propose a novel approach for reducing the search space based on aggregation of instance-based explanations of predictive models. We empirically show that reducing the search to these groups significantly reduces the time of feature construction. We show significant improvements in classification accuracy for several classifiers and demonstrate the feasibility of the proposed feature construction even for large datasets.
arXiv Detail & Related papers (2023-01-23T18:59:01Z)
Interpretable Mixture of Experts [71.55701784196253]
Interpretable Mixture of Experts (IME) is an inherently-interpretable modeling framework. IME is demonstrated to be more accurate than single interpretable models and perform comparably with existing state-of-the-art Deep Neural Networks (DNNs) in accuracy. IME's explanations are compared to commonly-used post-hoc explanations methods through a user study.
arXiv Detail & Related papers (2022-06-05T06:40:15Z)
ExpertNet: A Symbiosis of Classification and Clustering [22.324813752423044]
ExpertNet uses novel training strategies to learn clustered latent representations and leverage them by effectively combining cluster-specific classifiers. We demonstrate the superiority of ExpertNet over state-of-the-art methods on 6 large clinical datasets.
arXiv Detail & Related papers (2022-01-17T11:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.