Fugu-MT 論文翻訳(概要): Secure Linear Alignment of Large Language Models

論文の概要: Secure Linear Alignment of Large Language Models

arxiv url: http://arxiv.org/abs/2603.18908v1
Date: Thu, 19 Mar 2026 13:43:32 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-20 17:19:06.1769
Title: Secure Linear Alignment of Large Language Models
Title（参考訳）: 大規模言語モデルのセキュア線形アライメント
Authors: Matt Gorbett, Suman Jana,
Abstract要約: 言語モデルは、トレーニングの目的、アーキテクチャ、データモダリティの違いにもかかわらず、同様の表現を学ぶ傾向にある。本稿では、表現収束を利用してクロスサイロ推論を可能にするプライバシー保護フレームワークを提案する。線形アライメントが、独立に訓練されたモデル間でテキスト生成を可能にすることが、初めて示された。
参考スコア（独自算出の注目度）: 10.66607150500579
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. This emerging compatibility between independently trained models introduces new opportunities for cross-model alignment to downstream objectives. Moreover, it unlocks new potential application domains, such as settings where security, privacy, or competitive constraints prohibit direct data or model sharing. In this work, we propose a privacy-preserving framework that exploits representational convergence to enable cross-silo inference between independent language models. The framework learns an affine transformation over a shared public dataset and applies homomorphic encryption to protect client queries during inference. By encrypting only the linear alignment and classification operations, the method achieves sub-second inference latency while maintaining strong security guarantees. We support this framework with an empirical investigation into representational convergence, in which we learn linear transformations between the final hidden states of independent models. We evaluate these cross-model mappings on embedding classification and out-of-distribution detection, observing minimal performance degradation across model pairs. Additionally, we show for the first time that linear alignment sometimes enables text generation across independently trained models.
Abstract（参考訳）: 言語モデルは、トレーニングの目的、アーキテクチャ、データモダリティの違いにもかかわらず、同様の表現を学ぶ傾向にある。独立に訓練されたモデル間のこの新たな互換性は、下流の目的に対して、クロスモデルアライメントの新たな機会をもたらす。さらに、セキュリティやプライバシ、競争上の制約によって直接的なデータやモデル共有が禁止されるような、新たな潜在的なアプリケーションドメインもアンロックされる。本研究では,独立言語モデル間のクロスサイロ推論を実現するために,表現収束を利用したプライバシ保護フレームワークを提案する。このフレームワークは、共有公開データセット上のアフィン変換を学び、同型暗号化を適用して、推論中にクライアントクエリを保護する。線形アライメントと分類操作のみを暗号化することにより、強力なセキュリティ保証を維持しながら、サブ秒間推論遅延を実現する。我々はこの枠組みを表現収束に関する実証的研究で支持し、独立モデルの最終的な隠れ状態間の線形変換を学習する。モデルペア間の最小性能劣化を観測し, 組込み分類と分布外検出におけるこれらのクロスモデルマッピングの評価を行った。さらに、線形アライメントによって、独立に訓練されたモデル間でテキスト生成が可能であることも初めて示している。

論文の概要: Secure Linear Alignment of Large Language Models

関連論文リスト