Fugu-MT 論文翻訳(概要): Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

論文の概要: Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

arxiv url: http://arxiv.org/abs/2605.20193v1
Date: Sat, 04 Apr 2026 04:50:03 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-25 12:34:33.957505
Title: Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification
Title（参考訳）: マルチパスプロンプト検証による定性解析における量子化モデルの性能向上
Authors: Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye,
Abstract要約: 量子化大言語モデル(LLM)は、高速に動作し、少ない計算資源を必要とするため、定性的分析においてより頻繁に使用される。本研究では,LLaMA-3.1の質的解析における低ビット量子化レベルの違いが,LLaMA-3.1の性能に与える影響について検討する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Quantized Large Language Models (LLMs) are used more often in qualitative analysis because they run fast and need fewer computing resources. This study examines how different lower bits quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and quantization types affect the performance of LLaMA-3.1 (8B) on qualitative analysis. The study uses expert and non-expert responses from 82 interview transcripts. Low-bit models often produce higher levels of hallucinations and unstable results, especially when reading non-expert language with unclear terms. To improve performance, we propose a quantization-aware multi-pass prompt verification method. This method guides the model through controlled steps that reduce hallucinations. It removes unreliable content and passes the results to the next transcript after verification, improving accuracy. To validate performance, human coders analyzed transcripts using NVivo and BF16 LLaMA. BF16 LLaMA-3.1 produced high-precision output but had semantic drift and hallucination. These errors were corrected manually. The corrected BF16 output and NVivo human coding were combined to create a gold-standard ground truth (GSGT) for thematic extraction and frequency analysis. The results show that 8-bit models stay closest to the GSGT. The 4-bit models lose accuracy but become stable when the proposed method is applied. The 3-bit and 2-bit models drop in performance because of heavy compression, but they improve with the proposed prompt design and verification. The study also finds that models at the same bit level behave differently depending on quantization type. Overall, the method helps low-resource LLMs become more stable, accurate, and suitable for qualitative research at lower cost.
Abstract（参考訳）: 量子化大言語モデル(LLM)は、高速に動作し、少ない計算資源を必要とするため、定性的分析においてより頻繁に使用される。本研究では,低ビット量子化レベル(8ビット,4ビット,3ビット,2ビット)と量子化タイプの違いが,質的解析におけるLLaMA-3.1(8B)の性能に与える影響について検討した。この研究は82の面接書からの専門家と非専門家の回答を用いている。低ビットモデルはしばしば高いレベルの幻覚と不安定な結果をもたらす。性能向上のために,量子化対応マルチパスプロンプト検証手法を提案する。この方法は幻覚を減少させる制御ステップを通じてモデルを誘導する。信頼性の低いコンテンツを削除し、検証後に結果を次の書き起こしに渡すことにより、精度が向上する。性能を検証するため、人間のコーダーはNVivoとBF16 LLaMAを用いて転写を解析した。 BF16 LLaMA-3.1は高精度な出力が得られたが、セマンティックドリフトと幻覚を持っていた。これらの誤りは手動で修正された。修正されたBF16出力とNVivo人間の符号化を組み合わせて、主題抽出と周波数解析のための金標準基底真理(GSGT)を作成した。その結果、8ビットモデルがGSGTに最も近い状態にあることがわかった。 4ビットモデルでは精度は低下するが,提案手法を適用すると安定となる。 3ビットモデルと2ビットモデルは、重い圧縮のために性能が低下するが、提案されたプロンプト設計と検証により改善される。この研究はまた、同じビットレベルのモデルが量子化のタイプによって異なる振る舞いをすることを示した。本手法は,低リソースLCMがより安定で精度が高く,低コストで定性的研究に適している。

論文の概要: Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

関連論文リスト