Fugu-MT 論文翻訳(概要): Recognition, recall, and retention of few-shot memories in large language models

論文の概要: Recognition, recall, and retention of few-shot memories in large language models

arxiv url: http://arxiv.org/abs/2303.17557v1
Date: Thu, 30 Mar 2023 17:26:16 GMT
ステータス: 翻訳完了
システム内更新日: 2023-03-31 12:31:33.003497
Title: Recognition, recall, and retention of few-shot memories in large language models
Title（参考訳）: 大規模言語モデルにおける数発記憶の認識・記憶・保持
Authors: A. Emin Orhan
Abstract要約: 本研究では,大規模言語モデルを用いた単純な認識,リコール,保持実験について検討する。単一の露光は、モデルがほぼ完全な精度を達成するのに一般的に十分であることがわかった。高速学習におけるこの驚くべき能力の対極は、正確な記憶がすぐに上書きされることです。
参考スコア（独自算出の注目度）: 21.067139116005592
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The training of modern large language models (LLMs) takes place in a regime where most training examples are seen only a few times by the model during the course of training. What does a model remember about such examples seen only a few times during training and how long does that memory persist in the face of continuous training with new examples? Here, we investigate these questions through simple recognition, recall, and retention experiments with LLMs. In recognition experiments, we ask if the model can distinguish the seen example from a novel example; in recall experiments, we ask if the model can correctly recall the seen example when cued by a part of it; and in retention experiments, we periodically probe the model's memory for the original examples as the model is trained continuously with new examples. We find that a single exposure is generally sufficient for a model to achieve near perfect accuracy even in very challenging recognition experiments. We estimate that the recognition performance of even small language models easily exceeds human recognition performance reported in similar experiments with humans (Shepard, 1967). Achieving near perfect recall takes more exposures, but most models can do it in just 3 exposures. The flip side of this remarkable capacity for fast learning is that precise memories are quickly overwritten: recall performance for the original examples drops steeply over the first 10 training updates with new examples, followed by a more gradual decline. Even after 100K updates, however, some of the original examples are still recalled near perfectly. A qualitatively similar retention pattern has been observed in human long-term memory retention studies before (Bahrick, 1984). Finally, recognition is much more robust to interference than recall and memory for natural language sentences is generally superior to memory for stimuli without structure.
Abstract（参考訳）: 現代の大規模言語モデル(llm)のトレーニングは、トレーニング期間中にモデルによって、ほとんどのトレーニング例がわずか数回しか見られない体制で行われます。トレーニング中にほんの数回しか見られないような例や、新しい例による継続的トレーニングの面において、そのメモリはいつまで持続するのでしょうか? 本稿では,LLMを用いた簡単な認識,リコール,保持実験を通じて,これらの課題を考察する。認識実験では、モデルが見た例と新しい例を区別できるかどうかを問う。リコール実験では、モデルの一部にヒントを得た場合、その例を正しく思い出せるか、保持実験では、モデルが新しい例で継続的に訓練されているため、元の例に対するモデルの記憶を定期的に調査する。非常に困難な認識実験においても,モデルがほぼ完全な精度を達成できるためには,単一の露光が一般的に十分であることがわかった。ヒトとの類似実験(shepard, 1967)で報告された認識性能は,小言語モデルでも認識性能が人間の認識性能を上回っていると推定した。ほぼ完全なリコールを達成するにはより多くの露出が必要だが、ほとんどのモデルでは3回の露出で達成できる。初期の例のリコールパフォーマンスは、新しい例による最初の10のトレーニングアップデートよりも大幅に低下し、さらに徐々に低下します。しかし、1Kのアップデート後も、元の例のいくつかは依然として完全にリコールされている。ヒトの長期記憶保持研究(bahrick, 1984)では、質的に類似した保持パターンが観察されている。最後に、認識は自然言語文の記憶と記憶よりも干渉に対してはるかに頑健であり、構造のない刺激に対する記憶よりも優れている。

論文の概要: Recognition, recall, and retention of few-shot memories in large language models

関連論文リスト