Fugu-MT 論文翻訳(概要): SynQuE: Estimating Synthetic Dataset Quality Without Annotations

論文の概要: SynQuE: Estimating Synthetic Dataset Quality Without Annotations

arxiv url: http://arxiv.org/abs/2511.03928v1
Date: Thu, 06 Nov 2025 00:09:33 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-07 20:17:53.245155
Title: SynQuE: Estimating Synthetic Dataset Quality Without Annotations
Title（参考訳）: SynQuE: アノテーションなしで合成データセットの品質を推定する
Authors: Arthur Chen, Victor Zhong,
Abstract要約: 我々は,限定的な注釈付き実データのみを用いて,予測された実世界のタスク性能によって,合成データセットのランク付けの問題を定式化する。実データ上でのタスクパフォーマンスを最大化するために、トレーニング用の合成データを選択するプロキシメトリクスを導入することで、この問題に対する最初の包括的なベンチマークを確立する。以上の結果から,SynQuEプロキシは感情分析,テキスト2ナビゲーション,Webナビゲーション,画像分類など,さまざまなタスクにおける実際のタスクパフォーマンスと相関していることがわかった。
参考スコア（独自算出の注目度）: 6.628608274494256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce and formalize the Synthetic Dataset Quality Estimation (SynQuE) problem: ranking synthetic datasets by their expected real-world task performance using only limited unannotated real data. This addresses a critical and open challenge where data is scarce due to collection costs or privacy constraints. We establish the first comprehensive benchmarks for this problem by introducing and evaluating proxy metrics that choose synthetic data for training to maximize task performance on real data. We introduce the first proxy metrics for SynQuE by adapting distribution and diversity-based distance measures to our context via embedding models. To address the shortcomings of these metrics on complex planning tasks, we propose LENS, a novel proxy that leverages large language model reasoning. Our results show that SynQuE proxies correlate with real task performance across diverse tasks, including sentiment analysis, Text2SQL, web navigation, and image classification, with LENS consistently outperforming others on complex tasks by capturing nuanced characteristics. For instance, on text-to-SQL parsing, training on the top-3 synthetic datasets selected via SynQuE proxies can raise accuracy from 30.4% to 38.4 (+8.1)% on average compared to selecting data indiscriminately. This work establishes SynQuE as a practical framework for synthetic data selection under real-data scarcity and motivates future research on foundation model-based data characterization and fine-grained data selection.
Abstract（参考訳）: 本稿では,SynQuE問題(SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:S ynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE:SynQuE)について述べる。これは、収集コストやプライバシの制約によってデータが不足する、重要かつオープンな課題に対処する。実データ上でのタスクパフォーマンスを最大化するために、トレーニング用の合成データを選択するプロキシメトリクスを導入し、評価することで、この問題に対する最初の包括的なベンチマークを確立する。組込みモデルを用いて分布と多様性に基づく距離測定を文脈に適応させることにより、SynQuEの最初のプロキシメトリクスを導入する。複雑な計画課題におけるこれらの指標の欠点を解決するために,大規模言語モデル推論を利用した新しいプロキシであるLENSを提案する。以上の結果から,SynQuEプロキシは感情分析,Text2SQL,Webナビゲーション,画像分類など多種多様なタスクにまたがる実際のタスク性能と相関することがわかった。例えば、テキストからSQLのパースでは、SynQuEプロキシを介して選択された上位3つの合成データセットのトレーニングは、データを非差別的に選択するよりも、平均で30.4%から38.4(+8.1)まで精度を上げることができる。この研究は、実データ不足下での合成データ選択の実践的フレームワークとしてSynQuEを確立し、基礎モデルに基づくデータキャラクタリゼーションときめ細かいデータ選択に関する将来の研究を動機付けている。

論文の概要: SynQuE: Estimating Synthetic Dataset Quality Without Annotations

関連論文リスト