Fugu-MT 論文翻訳(概要): Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries

論文の概要: Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries

arxiv url: http://arxiv.org/abs/2508.13124v1
Date: Mon, 18 Aug 2025 17:31:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-19 14:49:11.514542
Title: Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries
Title（参考訳）: スポット・ザ・ブラインドスポット:コンタクトセンター哺乳動物における微粒LDMビアーゼの体系的同定と定量化
Authors: Kawin Mayilvaghanan, Siddhant Gupta, Ayush Kumar,
Abstract要約: BlindSpotは15の運用バイアス次元の分類に基づいて構築されたフレームワークである。 BlindSpotは、一対の転写文とその要約において、各バイアス次元のカテゴリ分布を生成する。分析の結果、バイアスは、サイズや家族に関係なく、すべての評価モデルにまたがって体系的であり、存在することが明らかとなった。
参考スコア（独自算出の注目度）: 3.4205390087622582
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Abstractive summarization is a core application in contact centers, where Large Language Models (LLMs) generate millions of summaries of call transcripts daily. Despite their apparent quality, it remains unclear whether LLMs systematically under- or over-attend to specific aspects of the transcript, potentially introducing biases in the generated summary. While prior work has examined social and positional biases, the specific forms of bias pertinent to contact center operations - which we term Operational Bias - have remained unexplored. To address this gap, we introduce BlindSpot, a framework built upon a taxonomy of 15 operational bias dimensions (e.g., disfluency, speaker, topic) for the identification and quantification of these biases. BlindSpot leverages an LLM as a zero-shot classifier to derive categorical distributions for each bias dimension in a pair of transcript and its summary. The bias is then quantified using two metrics: Fidelity Gap (the JS Divergence between distributions) and Coverage (the percentage of source labels omitted). Using BlindSpot, we conducted an empirical study with 2500 real call transcripts and their summaries generated by 20 LLMs of varying scales and families (e.g., GPT, Llama, Claude). Our analysis reveals that biases are systemic and present across all evaluated models, regardless of size or family.
Abstract（参考訳）: 抽象的な要約はコンタクトセンターの中核的なアプリケーションであり、Large Language Models (LLMs) は毎日数百万のコールトランスクリプトを生成する。明らかな品質にもかかわらず、LLMが書面の特定の側面に体系的に過小評価されているか、過小評価されているかは不明確であり、生成された要約にバイアスが生じる可能性がある。以前の研究は社会的偏見と位置的偏見を調査してきたが、コンタクトセンターの運営に関係する特定の形態の偏見は未解明のままである。このギャップに対処するために、我々はBlindSpotを紹介した。BlindSpotは15の運用バイアス次元(例えば、分散性、話者、トピック)の分類に基づいて構築され、これらのバイアスの同定と定量化を目的としている。 BlindSpot は LLM をゼロショット分類器として利用し、一対の転写文とその要約における各バイアス次元のカテゴリー分布を導出する。バイアスは、Fidelity Gap(ディストリビューション間のJS分散)とCoverage(ソースラベルのパーセンテージを省略)の2つのメトリクスを使って定量化されます。 BlindSpotを用いて2500の実呼書き起こしと20のLLM(例えば、GPT、Llama、Claude)で生成された要約文を用いて実験を行った。分析の結果、バイアスは、サイズや家族に関わらず、すべての評価モデルにまたがって体系的であり、存在することが明らかとなった。

論文の概要: Spot the BlindSpots: Systematic Identification and Quantification of Fine-Grained LLM Biases in Contact Center Summaries

関連論文リスト