Fugu-MT 論文翻訳(概要): MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair

論文の概要: MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair

arxiv url: http://arxiv.org/abs/2508.06963v1
Date: Sat, 09 Aug 2025 12:20:00 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-12 21:23:28.629911
Title: MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair
Title（参考訳）: MASteer: エンドツーエンドLLM信頼性修復のためのマルチエージェント適応ステア戦略
Authors: Changqing Li, Tianlin Li, Xiaohan Zhang, Aishan Liu, Li Pan,
Abstract要約: MASteerは、大規模言語モデル(LLM)における信頼性修復のためのエンドツーエンドフレームワークである。開発者のニーズに合わせて多種多様な高品質なステアサンプルを生成するマルチエージェントシステムであるAutoTesterと、推論中のコンテキスト認識戦略の自動選択のためのアンカーベクタを備えたアダプティブステアリング戦略を構築するAutoRepairerだ。実験の結果、MASteerはベースラインを一貫して上回り、LLaMA-3.1-8B-Chatで15.36%、Qwen-3-8B-Chatで4.21%改善し、一般的なモデル能力を維持した。
参考スコア（独自算出の注目度）: 24.187162194500317
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) face persistent and evolving trustworthiness issues, motivating developers to seek automated and flexible repair methods that enable convenient deployment across diverse scenarios. Existing repair methods like supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) are costly and slow, while prompt engineering lacks robustness and scalability. Representation engineering, which steers model behavior by injecting targeted concept vectors during inference, offers a lightweight, training-free alternative. However, current approaches depend on manually crafted samples and fixed steering strategies, limiting automation and adaptability. To overcome these challenges, we propose MASteer, the first end-to-end framework for trustworthiness repair in LLMs based on representation engineering. MASteer integrates two core components: AutoTester, a multi-agent system that generates diverse, high-quality steer samples tailored to developer needs; and AutoRepairer, which constructs adaptive steering strategies with anchor vectors for automated, context-aware strategy selection during inference. Experiments on standard and customized trustworthiness tasks show MASteer consistently outperforms baselines, improving metrics by 15.36% on LLaMA-3.1-8B-Chat and 4.21% on Qwen-3-8B-Chat, while maintaining general model capabilities. MASteer demonstrates strong robustness, generalization, and practical value for scalable, efficient trustworthiness repair.
Abstract（参考訳）: 大きな言語モデル(LLM)は永続的で進化する信頼性の問題に直面しており、開発者は様々なシナリオにまたがって便利なデプロイを可能にする自動化された柔軟な修復方法を模索する動機となっている。教師付き微調整(SFT)や人的フィードバックによる強化学習(RLHF)のような既存の修復手法はコストがかかり、遅い。推論中にターゲットとなる概念ベクトルを注入することでモデルをモデル化する表現工学は、軽量でトレーニング不要な代替手段を提供する。しかし、現在のアプローチは手作業によるサンプルと固定されたステアリング戦略に依存し、自動化と適応性を制限する。これらの課題を克服するために,表現工学に基づくLCMにおける信頼性修復のための最初のエンドツーエンドフレームワークであるMASteerを提案する。 MASteerは2つのコアコンポーネントを統合している。AutoTesterは開発者のニーズに合わせて多様な高品質なステアサンプルを生成するマルチエージェントシステムである。標準およびカスタマイズされた信頼性タスクの実験では、MASteerはベースラインを一貫して上回り、LLaMA-3.1-8B-Chatで15.36%、Qwen-3-8B-Chatで4.21%向上し、一般的なモデル能力を維持した。 MASteerは、スケーラブルで効率的な信頼性の修復のための強力な堅牢性、一般化、実用的な価値を示す。

論文の概要: MASteer: Multi-Agent Adaptive Steer Strategy for End-to-End LLM Trustworthiness Repair

関連論文リスト