Fugu-MT 論文翻訳(概要): Are Large Language Models Economically Viable for Industry Deployment?

論文の概要: Are Large Language Models Economically Viable for Industry Deployment?

arxiv url: http://arxiv.org/abs/2604.19342v1
Date: Tue, 21 Apr 2026 11:25:22 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-22 22:41:49.737166
Title: Are Large Language Models Economically Viable for Industry Deployment?
Title（参考訳）: 大規模言語モデルは産業展開に経済的に有効か?
Authors: Abdullah Mohammad, Sushant Kumar Ray, Pushkar Arora, Rafiq Ali, Ebad Shabbir, Gautam Siddharth Kashyap, Jiechao Gao, Usman Naseem,
Abstract要約: 大規模言語モデル(LLMs)によって駆動されるジェネレーティブAIは、医療決定のサポート、財務分析、企業検索、会話自動化といった業界にますます普及している。しかし、一般的な評価パイプラインは精度中心であり、デプロイメント評価ギャップを形成します。
参考スコア（独自算出の注目度）: 15.537777029587366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative AI-powered by Large Language Models (LLMs)-is increasingly deployed in industry across healthcare decision support, financial analytics, enterprise retrieval, and conversational automation, where reliability, efficiency, and cost control are critical. In such settings, models must satisfy strict constraints on energy, latency, and hardware utilization-not accuracy alone. Yet prevailing evaluation pipelines remain accuracy-centric, creating a Deployment-Evaluation Gap-the absence of operational and economic criteria in model assessment. To address this gap, we present EDGE-EVAL-a industry-oriented benchmarking framework that evaluates LLMs across their full lifecycle on legacy NVIDIA Tesla T4 GPUs. Benchmarking LLaMA and Qwen variants across three industrial tasks, we introduce five deployment metrics-Economic Break-Even (Nbreak), Intelligence-Per-Watt (IPW ), System Density (\r{ho}sys), Cold-Start Tax (Ctax), and Quantization Fidelity (Qret)-capturing profitability, energy efficiency, hardware scaling, serverless feasibility, and compression safety. Our results reveal a clear efficiency frontier-models in the <2B parameter class dominate larger baselines across economic and ecological dimensions. LLaMA-3.2-1B (INT4) achieves ROI break-even in 14 requests (median), delivers 3x higher energy-normalized intelligence than 7B models, and exceeds 6,900 tokens/s/GB under 4-bit quantization. We further uncover an efficiency anomaly-while QLoRA reduces memory footprint, it increases adaptation energy by up to 7x for small models-challenging prevailing assumptions about quantization-aware training in edge deployment.
Abstract（参考訳）: 大規模言語モデル(LLMs)によって駆動される生成AIは、信頼性、効率性、コスト管理が不可欠である医療決定サポート、財務分析、企業検索、会話自動化など、業界にますます普及している。このような設定では、モデルはエネルギー、レイテンシ、ハードウェア利用率に関する厳密な制約を満たさなければならない。しかし、一般的な評価パイプラインは精度中心であり、モデルアセスメントにおける運用基準と経済基準が欠如しているデプロイ評価ギャップを創出する。このギャップに対処するため,従来のNVIDIA Tesla T4 GPU上でのLCMのライフサイクル全体を評価する,業界指向のベンチマークフレームワークであるEDGE-EVALを紹介した。 LLaMAとQwenの3つの産業タスクをベンチマークし、エコノミック・ブレイク・エベン(Nbreak)、インテリジェンス・パー・ワット(IPW)、システム密度(\r{ho}sys)、コールド・スタート税(Ctax)、量子化フィデリティ(Qret)の5つのデプロイメントメトリクスを導入し、収益性、エネルギー効率、ハードウェアスケーリング、サーバーレスの実現可能性、圧縮安全性について検討した。以上の結果から,<2Bパラメータークラスのフロンティアモデルが経済的・生態学的次元にわたって大きなベースラインを占めることが明らかとなった。 LLaMA-3.2-1B (INT4) は、7Bモデルよりも3倍高いエネルギー正規化インテリジェンスを提供し、4ビット量子化の下で6,900トークン/s/GBを超えた。さらに,QLoRAはメモリフットプリントを削減し,小型モデルの適応エネルギーを最大7倍に向上させる。

論文の概要: Are Large Language Models Economically Viable for Industry Deployment?

関連論文リスト