Related papers: Less is More: Summary of Long Instructions is Better for Program Synthesis

Less is More: Summary of Long Instructions is Better for Program Synthesis

URL: http://arxiv.org/abs/2203.08597v1
Date: Wed, 16 Mar 2022 13:04:12 GMT
Title: Less is More: Summary of Long Instructions is Better for Program Synthesis
Authors: Kirby Kuznia, Swaroop Mishra, Mihir Parmar and Chitta Baral
Abstract summary: We show that pre-trained language models (LMs) benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description does not help models in understanding a task. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on an average in terms of strict accuracy.
Score: 20.66688303609522
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, names (which are included to help humans in understanding a task) does not help models in understanding a task. To this extent, we create a meta-dataset from the frequently used APPS dataset for the program synthesis task. Our meta-dataset consists of human and synthesized summary of the long and complicated programming questions. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on an average in terms of strict accuracy. Our analysis shows that summary significantly improve performance for introductory (9.86%) and interview (11.48%) related programming questions. However, it shows improvement by a small margin (~2%) for competitive programming questions, implying the scope for future research direction.

Related papers

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [59.76268575344119]
We introduce a novel framework for enhancing large language models' (LLMs) planning capabilities by using planning data derived from knowledge graphs (KGs) LLMs fine-tuned with KG data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval.
arXiv Detail & Related papers (2024-06-20T13:07:38Z)
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies [69.28082193942991]
This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills. utilizing tropes from movie storytelling, TiM evaluates the reasoning capabilities of state-of-the-art LLM-based approaches. To address these deficiencies, we propose Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR)
arXiv Detail & Related papers (2024-06-16T12:58:31Z)
Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities. Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z)
Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z)
TaskLAMA: Probing the Complex Task Understanding of Language Models [13.336015994186955]
Structured Complex Task Decomposition (SCTD) is a problem of breaking down a complex real-world task into a directed acyclic graph over individual steps that contribute to achieving the task. We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs) Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline.
arXiv Detail & Related papers (2023-08-29T13:36:45Z)
Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning [10.889271604723312]
Chain-of-thought (CoT) prompting with large language models has proven effective in numerous natural language processing tasks. We investigate two approaches to leverage the training data in a few-shot prompting scenario: dynamic program prompting and program distillation. Our experiments on three standard math word problem (MWP) datasets demonstrate the effectiveness of these approaches.
arXiv Detail & Related papers (2023-05-29T16:01:40Z)
Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks. We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks. Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z)
Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task. Learning an effective CQA model requires large amounts of human-annotated data. We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z)
Information-theoretic User Interaction: Significant Inputs for Program Synthesis [11.473616777800318]
We introduce the em significant questions problem, and show that it is hard in general. We develop an information-theoretic greedy approach for solving the problem. In the context of interactive program synthesis, we use the above result to develop an emactive program learner Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset.
arXiv Detail & Related papers (2020-06-22T21:46:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.