Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science
- URL: http://arxiv.org/abs/2512.09895v1
- Date: Wed, 10 Dec 2025 18:22:57 GMT
- Title: Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science
- Authors: Jane Greenberg, Scott McClellan, Addy Ireland, Robert Sammarco, Colton Gerber, Christopher B. Rauch, Mat Kelly, John Kunze, Yuan An, Eric Toberer,
- Abstract summary: MatSci-YAMZ is a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT) to support metadata vocabulary development.<n>The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science.
- Score: 0.054137763452448064
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This paper introduces MatSci-YAMZ, a platform that integrates artificial intelligence (AI) and human-in-the-loop (HILT), including crowdsourcing, to support metadata vocabulary development. The paper reports on a proof-of-concept use case evaluating the AI-HILT model in materials science, a highly interdisciplinary domain Six (6) participants affiliated with the NSF Institute for Data-Driven Dynamical Design (ID4) engaged with the MatSci-YAMZ plaform over several weeks, contributing term definitions and providing examples to prompt the AI-definitions refinement. Nineteen (19) AI-generated definitions were successfully created, with iterative feedback loops demonstrating the feasibility of AI-HILT refinement. Findings confirm the feasibility AI-HILT model highlighting 1) a successful proof of concept, 2) alignment with FAIR and open-science principles, 3) a research protocol to guide future studies, and 4) the potential for scalability across domains. Overall, MatSci-YAMZ's underlying model has the capacity to enhance semantic transparency and reduce time required for consensus building and metadata vocabulary development.
Related papers
- FITS: Towards an AI-Driven Fashion Information Tool for Sustainability [1.1246926724539141]
This work explores how Natural Language Processing (NLP) techniques can be applied to classify sustainability data for fashion brands.<n>We present a prototype Fashion Information Tool for Sustainability (FITS), a transformer-based system that extracts and classifies sustainability information.<n>FITS allows users to search for relevant data, analyze their own data, and explore the information via an interactive interface.
arXiv Detail & Related papers (2025-09-30T09:47:42Z) - Data Readiness for Scientific AI at Scale [0.0]
This paper examines how Data Readiness for AI (DRAI) principles apply to leadership-scale scientific datasets used to train foundation models.<n>We analyze archetypal across four representative domains - climate, nuclear fusion, bio/health, and materials.<n>We introduce a two-dimensional readiness framework composed of Data Readiness Levels (raw to AI-ready) and Data Processing Stages (ingest to shard)
arXiv Detail & Related papers (2025-07-30T18:30:37Z) - Teaching Large Language Models to Maintain Contextual Faithfulness via Synthetic Tasks and Reinforcement Learning [80.27561080938747]
CANOE is a framework to reduce hallucinations of faithfulness of large language models across different downstream tasks without human annotations.<n>Dual-GRPO is a rule-based reinforcement learning method that includes three tailored rule-based rewards derived from synthesized short-form QA data.<n> Experimental results show that CANOE greatly improves the faithfulness of LLMs across 11 different tasks, even outperforming the most advanced LLMs.
arXiv Detail & Related papers (2025-05-22T10:10:07Z) - Advancing AI Research Assistants with Expert-Involved Learning [84.30323604785646]
Large language models (LLMs) and large multimodal models (LMMs) promise to accelerate biomedical discovery, yet their reliability remains unclear.<n>We introduce ARIEL (AI Research Assistant for Expert-in-the-Loop Learning), an open-source evaluation and optimization framework.<n>We find that state-of-the-art models generate fluent but incomplete summaries, whereas LMMs struggle with detailed visual reasoning.
arXiv Detail & Related papers (2025-05-03T14:21:48Z) - A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z) - Survey on Vision-Language-Action Models [0.2636873872510828]
This work does not represent original research, but highlights how AI can help automate literature reviews.<n>Future research will focus on developing a structured framework for AI-assisted literature reviews.
arXiv Detail & Related papers (2025-02-07T11:56:46Z) - The Future of AI: Exploring the Potential of Large Concept Models [0.5755004576310334]
Generative AI has marked a pivotal era, with the term Large Language Models (LLMs) becoming a ubiquitous part of daily life.<n>LLMs have demonstrated exceptional capabilities in tasks such as text summarization, code generation, and creative writing.<n>To address these limitations, Meta has introduced Large Concept Models (LCMs)<n> LCMs use concepts as foundational units of understanding, enabling more sophisticated semantic reasoning and context-aware decision-making.
arXiv Detail & Related papers (2025-01-08T18:18:37Z) - Multi-Stage Knowledge Integration of Vision-Language Models for Continual Learning [79.46570165281084]
We propose a Multi-Stage Knowledge Integration network (MulKI) to emulate the human learning process in distillation methods.
MulKI achieves this through four stages, including Eliciting Ideas, Adding New Ideas, Distinguishing Ideas, and Making Connections.
Our method demonstrates significant improvements in maintaining zero-shot capabilities while supporting continual learning across diverse downstream tasks.
arXiv Detail & Related papers (2024-11-11T07:36:19Z) - Roadmap towards Superhuman Speech Understanding using Large Language Models [60.57947401837938]
Large language models (LLMs) integrate speech and audio data.
Recent advances, such as GPT-4o, highlight the potential for end-to-end speech LLMs.
We propose a five-level roadmap, ranging from basic automatic speech recognition (ASR) to advanced superhuman models.
arXiv Detail & Related papers (2024-10-17T06:44:06Z) - Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents [55.63497537202751]
Article explores the convergence of connectionist and symbolic artificial intelligence (AI)
Traditionally, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic.
Recent advancements in large language models (LLMs) highlight the potential of connectionist architectures in handling human language as a form of symbols.
arXiv Detail & Related papers (2024-07-11T14:00:53Z) - The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific
Progress in NLP [10.013604276642218]
I propose a paradigm for scientific progress in NLP centered around developing scalable, data-driven theories of linguistic structure.
I outline principles for data collection and theoretical modeling which can inform future scientific progress.
arXiv Detail & Related papers (2023-12-01T04:55:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.