Fugu-MT 論文翻訳(概要): CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation

論文の概要: CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation

arxiv url: http://arxiv.org/abs/2510.18895v1
Date: Mon, 20 Oct 2025 06:50:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-25 03:08:14.217617
Title: CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation
Title（参考訳）: コード生成のためのCosmoCore Affective Dream-Replay Reinforcement Learning
Authors: Santhosh Kumar Ravindran,
Abstract要約: 大規模言語モデルにおけるコード生成を強化するために感情信号を統合する強化学習アーキテクチャであるCosmoCoreを紹介する。高負の(クレンジ)エピソードは、政治外の更新中に5倍のリプレイのためにドリームキューで優先順位付けされ、低サプライズの成功は、過信やバッファの肥大を防ぐためにプルーニングされる。 CosmoCoreは幻覚コード(例えば構文エラーや論理的バグ)を48%削減し、自己訂正を45%高速化する。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We introduce CosmoCore, a neuroscience-inspired reinforcement learning (RL) architecture that integrates affective signals to enhance code generation in large language models (LLMs). Motivated by human and animal learning where embarrassment from mistakes drives rapid correction, as observed in training a puppy to avoid repeating errors after a single scolding CosmoCore tags code generation trajectories with valence and surprise using a lightweight multi-layer perceptron (MLP). High-negative valence (cringe) episodes, such as buggy code outputs, are prioritized in a Dream Queue for five-fold replay during off-policy updates, while low-surprise successes are pruned to prevent overconfidence and buffer bloat. Evaluated on code generation benchmarks like HumanEval and BigCodeBench, alongside simulations with a custom data pipeline environment, CosmoCore reduces hallucinated code (e.g., syntax errors or logical bugs) by 48\% and accelerates self-correction by 45\%. Local experiments using Hugging Face models in a PySpark environment validate these gains, with code snippets provided for replication. Ablations confirm valence tagging boosts curiosity in exploration, and pruning mitigates inefficiency. This framework extends RL from human feedback (RLHF) for more emotionally aware code assistants, with applications in IDEs and data pipelines. Code and the custom mini-world simulation are released.
Abstract（参考訳）: ニューロサイエンスにインスパイアされた強化学習(RL)アーキテクチャであるCosmoCoreを紹介する。ミスによる恥ずかしさが素早く修正される人間や動物による学習によって動機づけられた子犬は、軽量の多層パーセプトロン(MLP)を使用して、単一のスライディングCosmoCoreタグのコード生成トラジェクトリを価と驚きで生成した後、繰り返しエラーを避けるために子犬を訓練する。バグだらけのコードアウトプットなどの高負のバレンス(クレンジ)エピソードは、オフラインの更新中に5倍のリプレイのためにドリームキューで優先順位付けされ、低サプライズの成功は自信過剰やバッファの肥大を防ぐために打ち切られる。 HumanEvalやBigCodeBenchのようなコード生成ベンチマークに基づいて評価され、カスタムデータパイプライン環境によるシミュレーションと並行して、CosmoCoreは幻覚的コード(例えば、構文エラーや論理的バグ)を48\%削減し、自己修正を45\%加速する。 PySpark環境でHugging Faceモデルを使用したローカル実験では、レプリケーション用のコードスニペットが提供されている。アブレーションにより、価タグ付けは探究における好奇心を高め、刈り取りは非効率性を低下させる。このフレームワークは、RLを人間からのフィードバック(RLHF)から拡張し、IDEやデータパイプラインのアプリケーションとともに、より感情的に認識されたコードアシスタントを提供する。コードとカスタムのミニワールドシミュレーションがリリースされる。

論文の概要: CosmoCore Affective Dream-Replay Reinforcement Learning for Code Generation

関連論文リスト