Fugu-MT 論文翻訳(概要): LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions

論文の概要: LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions

arxiv url: http://arxiv.org/abs/2508.15801v1
Date: Wed, 13 Aug 2025 21:25:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-31 21:54:20.543313
Title: LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions
Title（参考訳）: LingVarBench:構造化音声転写における名前付きエンティティの自動認識のためのベンチマークLLM
Authors: Seyedali Mohammadi, Manas Paldhe, Amit Chhabra,
Abstract要約: 既存の抽出法は、不一致、中断、話者重複を含む会話音声で失敗する。自動検証を通じてこれらの制約に対処する合成データ生成パイプラインであるLingVarBenchを紹介する。 LingVarBenchは、合成会話データから構造化された抽出のための最初の体系的なベンチマークを提供する。
参考スコア（独自算出の注目度）: 1.2130055167466958
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Phone call transcript labeling is prohibitively expensive (approximately 2 USD per minute) due to privacy regulations, consent requirements, and manual annotation costs requiring 3 hours of expert time per hour of audio. Existing extraction methods fail on conversational speech containing disfluencies, interruptions, and speaker overlap. We introduce LingVarBench, a synthetic data generation pipeline that addresses these constraints through automated validation. First, we prompt an LLM to generate realistic structured field values across multiple use cases. Second, we recursively prompt the model to transform these values into thousands of natural conversational utterances containing typical phone call characteristics. Third, we validate each synthetic utterance by testing whether a separate LLM-based extractor can recover the original structured information. We employ DSPy's SIMBA optimizer to automatically synthesize extraction prompts from validated synthetic transcripts, eliminating manual prompt engineering. Our optimized prompts achieve up to 95 percent accuracy for numeric fields (vs. 88-89 percent zero-shot), 90 percent for names (vs. 47-79 percent), and over 80 percent for dates (vs. 72-77 percent) on real customer transcripts, demonstrating substantial gains over zero-shot prompting. The synthetic-to-real transfer demonstrates that conversational patterns learned from generated data generalize effectively to authentic phone calls containing background noise and domain-specific terminology. LingVarBench provides the first systematic benchmark for structured extraction from synthetic conversational data, demonstrating that automated prompt optimization overcomes cost and privacy barriers preventing large-scale phone call analysis in commercial settings.
Abstract（参考訳）: 電話による文字起こしのラベリングは、プライバシー規制、同意要件、マニュアルアノテーションのコストが1時間に3時間の専門的時間を必要とするため、禁止的に高価(約1分間に2USドル)である。既存の抽出法は、不一致、中断、話者重複を含む会話音声で失敗する。自動検証を通じてこれらの制約に対処する合成データ生成パイプラインであるLingVarBenchを紹介する。まず、LLMに複数のユースケースにまたがって現実的な構造化されたフィールド値を生成するように促す。第二に、これらの値を典型的な通話特性を含む何千もの自然な会話発話に変換するよう、モデルに再帰的に促す。第3に、別個のLCMベースの抽出器が元の構造化情報を復元できるかどうかを検証して、それぞれの合成発話を検証する。我々は、DSPyのSIMBAオプティマイザを用いて、検証済みの合成文からの抽出プロンプトを自動的に合成し、手動のプロンプト工学を除去する。最適化されたプロンプトは、数値フィールドの最大95%の精度(88～99%ゼロショット)、名前の90%(47～99%)、実際の顧客書き起こしの日付(72～77%)を最大で達成し、ゼロショットプロンプトよりも大幅に向上した。生成したデータから学習した会話パターンが、背景雑音やドメイン固有の用語を含む認証電話に効果的に一般化されることを示す。 LingVarBenchは、合成会話データから構造化された抽出のための最初の体系的なベンチマークを提供する。

論文の概要: LingVarBench: Benchmarking LLM for Automated Named Entity Recognition in Structured Synthetic Spoken Transcriptions

関連論文リスト