Fugu-MT 論文翻訳(概要): Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights

論文の概要: Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights

arxiv url: http://arxiv.org/abs/2505.10043v2
Date: Wed, 21 May 2025 03:08:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-05-22 13:19:52.300662
Title: Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Title（参考訳）: セマンティック・インサイトを用いた学習によるテキスト・チャート検索の促進
Authors: Yifan Wu, Lutao Yan, Yizhang Zhu, Yinan Mei, Jiannan Wang, Nan Tang, Yuyu Luo,
Abstract要約: 既存のテキストからチャートへの検索ソリューションは、しばしばチャートのセマンティックコンテンツとコンテキスト情報をキャプチャするのに失敗する。本稿では,グラフの階層的意味的洞察を自動的に合成する学習データ開発パイプラインを提案する。私たちはCLIPベースのモデルChartFinderをトレーニングし、テキストからチャートへの検索のためのチャートのより良い表現を学習します。
参考スコア（独自算出の注目度）: 21.97276088041938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Charts are crucial for data analysis and decision-making.Text-to-chart retrieval systems have become increasingly important for Business Intelligence (BI), where users need to find relevant charts that match their analytical needs. These needs can be categorized into precise queries that are well-specified and fuzzy queries that are more exploratory -- both require understanding the semantics and context of the charts. However, existing text-to-chart retrieval solutions often fail to capture the semantic content and contextual information of charts, primarily due to the lack of comprehensive metadata (or semantic insights). To address this limitation, we propose a training data development pipeline that automatically synthesizes hierarchical semantic insights for charts, covering visual patterns (visual-oriented), statistical properties (statistics-oriented), and practical applications (task-oriented), which produces 207,498 semantic insights for 69,166 charts. Based on these, we train a CLIP-based model named ChartFinder to learn better representations of charts for text-to-chart retrieval. Our method leverages rich semantic insights during the training phase to develop a model that understands both visual and semantic aspects of charts.To evaluate text-to-chart retrieval performance, we curate the first benchmark, CRBench, for this task with 21,862 charts and 326 text queries from real-world BI applications, with ground-truth labels verified by the crowd workers.Experiments show that ChartFinder significantly outperforms existing methods in text-to-chart retrieval tasks across various settings. For precise queries, ChartFinder achieves up to 66.9% NDCG@10, which is 11.58% higher than state-of-the-art models. In fuzzy query tasks, our method also demonstrates consistent improvements, with an average increase of 5% across nearly all metrics.
Abstract（参考訳）: ビジネスインテリジェンス(BI)では,分析ニーズに合致する関連するチャートを見つける必要がある。これらのニーズは、明確に特定された正確なクエリと、より探索的なファジィクエリに分類することができます。しかし、既存のテキストからチャートへの検索ソリューションは、主に包括的なメタデータ(またはセマンティックインサイト)の欠如のために、チャートのセマンティックコンテンツとコンテキスト情報をキャプチャできないことが多い。この制限に対処するため、我々は、視覚パターン(視覚指向)、統計特性(統計指向)、実用的なアプリケーション(タスク指向)を網羅し、グラフの階層的意味的洞察を自動的に合成するトレーニングデータ開発パイプラインを提案し、69,166のグラフに対して207,498の意味的洞察を生成する。これらに基づき、我々はCLIPベースのChartFinderモデルをトレーニングし、テキスト・ツー・チャート検索のためのチャートのより良い表現を学習する。本手法は,学習段階において,表の視覚的側面と意味的側面の両方を理解するモデルを開発するために,リッチな意味的洞察を活用し,テキストからチャートへの検索性能を評価するために,実世界のBIアプリケーションから21,862のチャートと326のテキストクエリを用いて最初のベンチマークCRBenchをキュレートする。正確なクエリでは、ChartFinderは66.9%のNDCG@10を達成しており、最先端のモデルよりも11.58%高い。ファジィなクエリタスクでは、ほぼすべてのメトリクスで平均5%増加し、一貫した改善が示される。

関連論文リスト

ChartCards: A Chart-Metadata Generation Framework for Multi-Task Chart Understanding [18.857927344450932]
マルチタスクチャート理解のための統合チャートメタタ生成フレームワークであるChartCardsを提案する。 ChartCardsを用いて,10,862データテーブル,85Kチャート,170Kチャートキャプションを含む大規模高品質データセットであるMetaChartを構築した。 MetaChartの6つのモデルを微調整した結果、すべてのタスクの平均性能は5%向上した。
論文参考訳（メタデータ） (2025-05-21T03:07:47Z)
ChartAdapter: Large Vision-Language Model for Chart Summarization [13.499376163294816]
ChartAdapterは、チャートとテキスト要約の間のギャップを埋めるために設計された軽量トランスフォーマーモジュールである。 LLMとChartAdapterを統合することで、エンドツーエンドのトレーニングと効率的なチャート要約を可能にします。
論文参考訳（メタデータ） (2024-12-30T05:07:34Z)
AskChart: Universal Chart Understanding through Textual Enhancement [20.075911012193494]
最先端のアプローチは、主にチャートイメージからの視覚的手がかりに焦点を当て、チャート内に埋め込まれたリッチテキスト情報を明示的に組み込むことができない。 AskChartは、Mixture of Experts (MoE)アーキテクチャを用いて、チャートからテキストと視覚の両方のキューを明示的に統合するユニバーサルモデルである。
論文参考訳（メタデータ） (2024-12-26T09:59:43Z)
On Pre-training of Multimodal Language Models Customized for Chart Understanding [83.99377088129282]
本稿では,MLLMのチャート理解を改善するために必要な学習過程について考察する。詳細なチャート理解に適したMLLMであるCHOPINLLMを紹介する。
論文参考訳（メタデータ） (2024-07-19T17:58:36Z)
FlowLearn: Evaluating Large Vision-Language Models on Flowchart Understanding [52.35520385083425]
FlowLearnデータセットは、フローチャートの理解を強化するために設計されたリソースである。科学的サブセットは、科学文献から得られた3,858のフローチャートを含んでいる。シミュレーションされたサブセットには、カスタマイズ可能なスクリプトを使用して作成された10,000のフローチャートが含まれている。
論文参考訳（メタデータ） (2024-07-06T20:58:51Z)
ChartAssisstant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning [54.89249749894061]
ChartAssistantは、ユニバーサルチャートの理解と推論のためのビジョン言語モデルである。 2段階のトレーニングプロセスを経て、チャートとテキストの調整のために、チャートからテーブルへのパースを事前トレーニングする。実験により, 最先端UniChart法とChartllama法に比較して, 顕著な性能向上が得られた。
論文参考訳（メタデータ） (2024-01-04T17:51:48Z)
ChartLlama: A Multimodal LLM for Chart Understanding and Generation [70.1393163657813]
GPT-4を利用した高品質な命令チューニングデータセットを作成する。次に、生成したデータセットを使ってトレーニングしたマルチモーダルな大規模言語モデルであるChartLlamaを紹介します。
論文参考訳（メタデータ） (2023-11-27T15:20:23Z)
StructChart: On the Schema, Metric, and Augmentation for Visual Chart Understanding [54.45681512355684]
現在のチャート関連タスクは、ビジュアルチャートから情報を抽出するチャート認識か、抽出されたデータに基づいてチャート推論にフォーカスする。我々はStructChartを紹介した。StructChartはStruct Triplet Representations(STR)を利用して、統一的でラベル効率のよいアプローチを実現する新しいフレームワークである。
論文参考訳（メタデータ） (2023-09-20T12:51:13Z)
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning [29.947053208614246]
We present UniChart, a pretrained model for chart comprehension and reasoning。 UniChartは、チャートの関連するテキスト、データ、および視覚要素をエンコードし、その後、チャートグラウンドのテキストデコーダを使用して、自然言語で期待される出力を生成する。 i) チャートから視覚要素(バーや線など)とデータを抽出する低レベルタスク、(ii) チャート理解と推論のスキルを得るための高レベルタスクなどである。
論文参考訳（メタデータ） (2023-05-24T06:11:17Z)
ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules [89.75395046894809]
ChartReaderは、チャートのデレンダリングと理解タスクをシームレスに統合する統合フレームワークです。提案手法には,トランスフォーマーに基づくチャートコンポーネント検出モジュールと,チャートからXまでのタスクに対する事前学習型視覚言語モデルが組み込まれている。提案するフレームワークは,チャート解析に係わる作業を大幅に削減し,ユニバーサルチャート理解モデルへの一歩を踏み出すことができる。
論文参考訳（メタデータ） (2023-04-05T00:25:27Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。