Fugu-MT 論文翻訳(概要): PolyReal: A Benchmark for Real-World Polymer Science Workflows

論文の概要: PolyReal: A Benchmark for Real-World Polymer Science Workflows

arxiv url: http://arxiv.org/abs/2604.02934v1
Date: Fri, 03 Apr 2026 10:05:30 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 17:20:24.440913
Title: PolyReal: A Benchmark for Real-World Polymer Science Workflows
Title（参考訳）: PolyReal: 現実世界の高分子科学ワークフローのベンチマーク
Authors: Wanhao Liu, Weida Wang, Jiaqing Xie, Suorong Yang, Jue Wang, Benteng Chen, Guangtao Mei, Zonglin Yang, Shufei Zhang, Yuchun Mo, Lang Cheng, Jin Zeng, Houqiang Li, Wanli Ouyang, Yuqiang Li,
Abstract要約: PolyRealは、現実世界の科学的実践に基づく新しいベンチマークだ。 1)知識応用,(2)実験室の安全分析,(3)実験機構推論,(4)生データ抽出,(5)性能とアプリケーション探索の5つの重要な機能を網羅している。
参考スコア（独自算出の注目度）: 76.8670713411835
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal Large Language Models (MLLMs) excel in general domains but struggle with complex, real-world science. We posit that polymer science, an interdisciplinary field spanning chemistry, physics, biology, and engineering, is an ideal high-stakes testbed due to its diverse multimodal data. Yet, existing benchmarks related to polymer science largely overlook real-world workflows, limiting their practical utility and failing to systematically evaluate MLLMs across the full, practice-grounded lifecycle of experimentation. We introduce PolyReal, a novel multimodal benchmark grounded in real-world scientific practices to evaluate MLLMs on the full lifecycle of polymer experimentation. It covers five critical capabilities: (1) foundational knowledge application; (2) lab safety analysis; (3) experiment mechanism reasoning; (4) raw data extraction; and (5) performance & application exploration. Our evaluation of leading MLLMs on PolyReal reveals a capability imbalance. While models perform well on knowledge-intensive reasoning (e.g., Experiment Mechanism Reasoning), they drop sharply on practice-based tasks (e.g., Lab Safety Analysis and Raw Data Extraction). This exposes a severe gap between abstract scientific knowledge and its practical, context-dependent application, showing that these real-world tasks remain challenging for MLLMs. Thus, PolyReal helps address this evaluation gap and provides a practical benchmark for assessing AI systems in real-world scientific workflows.
Abstract（参考訳）: MLLM(Multimodal Large Language Models)は、一般的なドメインにおいて優れているが、複雑で現実的な科学に苦しむ。我々は, 化学, 物理, 生物学, 工学にまたがる学際的な分野である高分子科学が, 多様なマルチモーダルデータにより, 理想的なハイテイクテストベッドであると仮定する。しかし、高分子科学に関連する既存のベンチマークは、実世界のワークフローを概ね見落とし、実用性を制限し、実験の完全な実践的なライフサイクル全体にわたってMLLMを体系的に評価することができない。 The PolyReal is a novel multimodal benchmark based on real-world scientific practices to evaluate MLLMs on the full cycle of polymer experimentation。本研究は,(1)基礎知識応用,(2)実験室安全分析,(3)実験機構推論,(4)生データ抽出,(5)性能と応用探索の5つの重要な機能をカバーする。 PolyReal上でのMLLMのリード評価により,機能的不均衡が明らかとなった。モデルは知識集約的推論(例:実験メカニズム推論)でうまく機能する一方で、実践ベースのタスク(例:実験安全分析、生データ抽出)では急激に低下します。このことは、抽象的な科学的知識と実践的で文脈に依存した応用との深刻なギャップを露呈し、これらの現実世界のタスクがMLLMにとって困難なままであることを示している。このため、PolyRealはこの評価ギャップに対処し、現実世界の科学ワークフローでAIシステムを評価するための実践的なベンチマークを提供する。

論文の概要: PolyReal: A Benchmark for Real-World Polymer Science Workflows

関連論文リスト