Fugu-MT 論文翻訳(概要): ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

論文の概要: ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

arxiv url: http://arxiv.org/abs/2603.27064v1
Date: Sat, 28 Mar 2026 00:45:05 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.763679
Title: ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
Title（参考訳）: ChartNet:ロバストチャート理解のための数百万規模の高品質マルチモーダルデータセット
Authors: Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Isaac Sanchez, Ben Wiesel, Shafiq Abedin, Amit Alfassy, Eli Schwartz, Daniel Caraballo, Yagmur Gizem Cinar, Florian Scheidegger, Steven I. Ross, Daniel Karl I. Weidele, Hang Hua, Ekaterina Arutyunova, Roei Herzig, Zexue He, Zihan Wang, Xinyue Yu, Yunfei Zhao, Sicong Jiang, Minghao Liu, Qunshu Lin, Peter Staar, Luis Lastras, Aude Oliva, Rogerio Feris,
Abstract要約: チャートを理解するには、幾何学的視覚パターン、構造化された数値データ、自然言語を共同で推論する必要がある。 ChartNetは、チャートの解釈と推論を促進するために設計された、高品質で100万スケールのマルチモーダルデータセットである。
参考スコア（独自算出の注目度）: 26.60504788691021
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding charts requires models to jointly reason over geometric visual patterns, structured numerical data, and natural language -- a capability where current vision-language models (VLMs) remain limited. We introduce ChartNet, a high-quality, million-scale multimodal dataset designed to advance chart interpretation and reasoning. ChartNet leverages a novel code-guided synthesis pipeline to generate 1.5 million diverse chart samples spanning 24 chart types and 6 plotting libraries. Each sample consists of five aligned components: plotting code, rendered chart image, data table, natural language summary, and question-answering with reasoning, providing fine-grained cross-modal alignment. To capture the full spectrum of chart comprehension, ChartNet additionally includes specialized subsets encompassing human annotated data, real-world data, safety, and grounding. Moreover, a rigorous quality-filtering pipeline ensures visual fidelity, semantic accuracy, and diversity across chart representations. Fine-tuning on ChartNet consistently improves results across benchmarks, demonstrating its utility as large-scale supervision for multimodal models. As the largest open-source dataset of its kind, ChartNet aims to support the development of foundation models with robust and generalizable capabilities for data visualization understanding. The dataset is publicly available at https://huggingface.co/datasets/ibm-granite/ChartNet
Abstract（参考訳）: チャートを理解するには、幾何学的な視覚パターン、構造化された数値データ、そして自然言語を共同で推論する必要がある。 ChartNetは、チャートの解釈と推論を促進するために設計された、高品質で100万スケールのマルチモーダルデータセットである。 ChartNetは、新しいコード誘導合成パイプラインを活用して、24のチャートタイプと6つのプロットライブラリにまたがる150万の多様なチャートサンプルを生成する。各サンプルは、プロットコード、描画されたチャートイメージ、データテーブル、自然言語の要約、推論による質問応答の5つの整列コンポーネントで構成され、粒度の細かいクロスモーダルアライメントを提供する。チャート理解の全スペクトルをキャプチャするために、ChartNetには、人間の注釈付きデータ、現実世界のデータ、安全性、グラウンドを含む特別なサブセットが含まれている。さらに、厳密な品質フィルタリングパイプラインにより、チャート表現間の視覚的忠実度、意味的精度、多様性が保証される。 ChartNetの微調整はベンチマーク全体の結果を継続的に改善し、マルチモーダルモデルの大規模な監視手段としての実用性を実証している。 ChartNetはそのタイプの最大のオープンソースデータセットであり、データ視覚化理解のための堅牢で汎用的な機能を備えた基礎モデルの開発を支援することを目指している。データセットはhttps://huggingface.co/datasets/ibm-granite/ChartNetで公開されている。

論文の概要: ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding

関連論文リスト