Fugu-MT 論文翻訳(概要): Fine-tuning language models to find agreement among humans with diverse preferences

論文の概要: Fine-tuning language models to find agreement among humans with diverse preferences

arxiv url: http://arxiv.org/abs/2211.15006v1
Date: Mon, 28 Nov 2022 02:24:14 GMT
ステータス: 翻訳完了
システム内更新日: 2022-11-29 15:21:21.733041
Title: Fine-tuning language models to find agreement among humans with diverse preferences
Title（参考訳）: 多様な嗜好を持つヒトの合意を見つけるための微調整言語モデル
Authors: Michiel A. Bakker and Martin J. Chadwick and Hannah R. Sheahan and Michael Henry Tessler and Lucy Campbell-Gillingham and Jan Balaguer and Nat McAleese and Amelia Glaese and John Aslanides and Matthew M. Botvinick and Christopher Summerfield
Abstract要約: 大規模言語モデリング(LLM)における最近の研究は、出力をプロトタイプユーザの好みに合わせるために微調整を用いている。ここでは、多様な視点を持つ人々が合意を見つけるのに、マシンがどのように役立つかを検討する。我々は、潜在的に多様な意見を持つ人々のグループの期待された承認を最大化するステートメントを生成するために、700億のパラメータLSMを微調整する。グループメンバーのサブセットのみからの合意文を静かに構築すると、除外されたメンバーは反対する傾向にあった。
参考スコア（独自算出の注目度）: 7.702628192754256
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. This work assumes that human preferences are static and homogeneous across individuals, so that aligning to a a single "generic" user will confer more general alignment. Here, we embrace the heterogeneity of human preferences to consider a different challenge: how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. Human participants provide written opinions on thousands of questions touching on moral and political issues (e.g., "should we raise taxes on the rich?"), and rate the LLM's generated candidate consensus statements for agreement and quality. A reward model is then trained to predict individual preferences, enabling it to quantify and rank consensus statements in terms of their appeal to the overall group, defined according to different aggregation (social welfare) functions. The model produces consensus statements that are preferred by human users over those from prompted LLMs (>70%) and significantly outperforms a tight fine-tuned baseline that lacks the final ranking step. Further, our best model's consensus statements are preferred over the best human-generated opinions (>65%). We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent, revealing the sensitivity of the consensus to individual contributions. These results highlight the potential to use LLMs to help groups of humans align their values with one another.
Abstract（参考訳）: 大規模言語モデリング(LLM)における最近の研究は、出力をプロトタイプユーザの好みに合わせるために微調整を用いている。この研究は、人間の嗜好が個人間で静的で均質であると仮定し、単一の"ジェネリック"なユーザーとの整合がより一般的な整合性を与える。ここでは、人間の嗜好の不均一性を受け入れて、異なる課題を考える: 多様な視点を持つ人々が合意を見つけるのに、マシンはどのように役立つのか? 我々は700億のパラメータllmを微調整し、多様な意見を持つグループに対して、期待される承認を最大化する声明を生成する。人間の参加者は、道徳的問題や政治的問題(例えば、「富裕層に税金を課すべきか?」など)に関する数千の質問について意見書を提出し、LLMが生成した合意と品質に関する合意書を評価する。次に、報酬モデルは個々の選好を予測するために訓練され、異なる集約(社会福祉)機能に従って定義されたグループ全体へのアピールの観点からコンセンサスステートメントを定量化しランク付けすることができる。このモデルでは, LLM(>70%)よりも人間の方が好まれるコンセンサス文を生成し, 最終ランク付けステップに欠ける厳密な微調整ベースラインを著しく上回っている。さらに、ベストモデルのコンセンサスステートメントは、最高の人間生成の意見(>65%)よりも好まれます。グループメンバーのサブセットからのみ合意文を静かに構築すると、除外されたメンバは反対する傾向があり、個々のコントリビューションに対する合意の感受性が明らかになる。これらの結果は、人間のグループ同士の価値観の整合を支援するためにLLMを使うことの可能性を強調している。

論文の概要: Fine-tuning language models to find agreement among humans with diverse preferences

関連論文リスト