Related papers: CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

URL: http://arxiv.org/abs/2402.04858v2
Date: Mon, 1 Jul 2024 10:03:33 GMT
Title: CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay
Authors: Natasha Butt, Blazej Manczak, Auke Wiggers, Corrado Rainone, David W. Zhang, Michaël Defferrard, Taco Cohen,
Abstract summary: We introduce a novel and scalable method for language model self-improvement called Code It (CodeIt) CodeIt iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization.
Score: 12.499776923362461
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models are increasingly solving tasks that are commonly believed to require human-level reasoning ability. However, these models still perform very poorly on benchmarks of general intelligence such as the Abstraction and Reasoning Corpus (ARC). In this paper, we approach ARC as a programming-by-examples problem, and introduce a novel and scalable method for language model self-improvement called Code Iteration (CodeIt). Our method iterates between 1) program sampling and hindsight relabeling, and 2) learning from prioritized experience replay. By relabeling the goal of an episode (i.e., the target program output given input) to the realized output produced by the sampled program, our method effectively deals with the extreme sparsity of rewards in program synthesis. Applying CodeIt to the ARC dataset, we demonstrate that prioritized hindsight replay, along with pre-training and data-augmentation, leads to successful inter-task generalization. CodeIt is the first neuro-symbolic approach that scales to the full ARC evaluation dataset. Our method solves 15% of ARC evaluation tasks, achieving state-of-the-art performance and outperforming existing neural and symbolic baselines. Our code is available at https://github.com/Qualcomm-AI-research/codeit .

Related papers

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique [59.18475981916166]
We introduce OpenCodeReasoning-II, a dataset consisting of 2.5M question-solution-critique triples (approx. 35K unique programming questions)<n>In this work, we employ a two-stage supervised fine-tuning strategy. The first stage focuses on fine-tuning for code generation, while the second stage involves the joint training of models for both code generation and critique. Notably, the integration of our code generation and critique models leads to significant improvements in competitive coding performance.
arXiv Detail & Related papers (2025-07-11T23:35:54Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Intelligence Analysis of Language Models [0.0]
We test the effectiveness of Large Language Models (LLMs) on the Abstraction and Reasoning Corpus (ARC) dataset. This dataset serves as a representative benchmark for testing abstract reasoning abilities. We investigate the application of the Chain-of-Thought (CoT) technique, aiming to determine its role in improving model performance.
arXiv Detail & Related papers (2024-07-20T13:48:16Z)
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition [80.22784377150465]
Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. This paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER. NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph (PGD)
arXiv Detail & Related papers (2024-07-16T04:52:39Z)
Zero-Shot Code Representation Learning via Prompt Tuning [6.40875582886359]
We propose Zecoler, a zero-shot approach for learning code representations. Zecoler is built upon a pre-trained programming language model. We evaluate Zecoler in five code intelligence tasks including code clone detection, code search, method name prediction, code summarization, and code generation.
arXiv Detail & Related papers (2024-04-13T09:47:07Z)
Neural networks for abstraction and reasoning: Towards broad generalization in machines [3.165509887826658]
We look at novel approaches for solving the Abstraction & Reasoning Corpus (ARC) We adapt the DreamCoder neurosymbolic reasoning solver to ARC. We present the Perceptual Abstraction and Reasoning Language (PeARL) language, which allows DreamCoder to solve ARC tasks. We publish the arckit Python library to make future research on ARC easier.
arXiv Detail & Related papers (2024-02-05T20:48:57Z)
LLMs and the Abstraction and Reasoning Corpus: Successes, Failures, and the Importance of Object-based Representations [50.431003245201644]
We show that GPT-4 is unable to "reason" perfectly within non-language domains such as the 1D-ARC or a simple ARC subset. We propose an object-based representation that is obtained through an external tool, resulting in nearly doubling the performance on solved ARC tasks and near-perfect scores on the easier 1D-ARC.
arXiv Detail & Related papers (2023-05-26T16:32:17Z)
The Wisdom of Hindsight Makes Language Models Better Instruction Followers [84.9120606803906]
Reinforcement learning has seen wide success in finetuning large language models to better align with instructions via human feedback. In this paper, we consider an alternative approach: converting feedback to instruction by relabeling the original one and training the model for better alignment in a supervised manner. We propose Hindsight Instruction Relabeling (HIR), a novel algorithm for aligning language models with instructions.
arXiv Detail & Related papers (2023-02-10T12:16:38Z)
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning [92.36705236706678]
"CodeRL" is a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning. During inference, we introduce a new generation procedure with a critical sampling strategy. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives.
arXiv Detail & Related papers (2022-07-05T02:42:15Z)
E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation [95.49128988683191]
Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. We propose an encoding-enhanced seq2seq pretraining strategy, namely E2S2. E2S2 improves the seq2seq models via integrating more efficient self-supervised information into the encoders.
arXiv Detail & Related papers (2022-05-30T08:25:36Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
DeepSumm -- Deep Code Summaries using Neural Transformer Architecture [8.566457170664927]
We employ neural techniques to solve the task of source code summarizing. With supervised samples of more than 2.1m comments and code, we reduce the training time by more than 50%.
arXiv Detail & Related papers (2020-03-31T22:43:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.