Less is More: Summary of Long Instructions is Better for Program
Synthesis
- URL: http://arxiv.org/abs/2203.08597v1
- Date: Wed, 16 Mar 2022 13:04:12 GMT
- Title: Less is More: Summary of Long Instructions is Better for Program
Synthesis
- Authors: Kirby Kuznia, Swaroop Mishra, Mihir Parmar and Chitta Baral
- Abstract summary: We show that pre-trained language models (LMs) benefit from the summarized version of complicated questions.
Our findings show that superfluous information often present in problem description does not help models in understanding a task.
Experimental results on Codex show that our proposed approach outperforms baseline by 8.13% on an average in terms of strict accuracy.
- Score: 20.66688303609522
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the success of large pre-trained language models (LMs) such as Codex,
they show below-par performance on the larger and more complicated programming
related questions. We show that LMs benefit from the summarized version of
complicated questions. Our findings show that superfluous information often
present in problem description such as human characters, background stories,
names (which are included to help humans in understanding a task) does not help
models in understanding a task. To this extent, we create a meta-dataset from
the frequently used APPS dataset for the program synthesis task. Our
meta-dataset consists of human and synthesized summary of the long and
complicated programming questions. Experimental results on Codex show that our
proposed approach outperforms baseline by 8.13% on an average in terms of
strict accuracy. Our analysis shows that summary significantly improve
performance for introductory (9.86%) and interview (11.48%) related programming
questions. However, it shows improvement by a small margin (~2%) for
competitive programming questions, implying the scope for future research
direction.
Related papers
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts.
We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM.
We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z) - Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [59.76268575344119]
We introduce a novel framework for enhancing large language models' (LLMs) planning capabilities by using planning data derived from knowledge graphs (KGs)
LLMs fine-tuned with KG data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval.
arXiv Detail & Related papers (2024-06-20T13:07:38Z) - Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies [69.28082193942991]
This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills.
utilizing tropes from movie storytelling, TiM evaluates the reasoning capabilities of state-of-the-art LLM-based approaches.
To address these deficiencies, we propose Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR)
arXiv Detail & Related papers (2024-06-16T12:58:31Z) - Learning to Reason via Program Generation, Emulation, and Search [33.11955431589091]
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities.
Not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding.
We propose Code Generation and Emulated EXecution (CoGEX) to extend an LM's program synthesis skills to such tasks.
arXiv Detail & Related papers (2024-05-25T19:40:50Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - TaskLAMA: Probing the Complex Task Understanding of Language Models [13.336015994186955]
Structured Complex Task Decomposition (SCTD) is a problem of breaking down a complex real-world task into a directed acyclic graph over individual steps that contribute to achieving the task.
We probe how accurately SCTD can be done with the knowledge extracted from Large Language Models (LLMs)
Our experiments reveal that LLMs are able to decompose complex tasks into individual steps effectively, with a relative improvement of 15% to 280% over the best baseline.
arXiv Detail & Related papers (2023-08-29T13:36:45Z) - Leveraging Training Data in Few-Shot Prompting for Numerical Reasoning [10.889271604723312]
Chain-of-thought (CoT) prompting with large language models has proven effective in numerous natural language processing tasks.
We investigate two approaches to leverage the training data in a few-shot prompting scenario: dynamic program prompting and program distillation.
Our experiments on three standard math word problem (MWP) datasets demonstrate the effectiveness of these approaches.
arXiv Detail & Related papers (2023-05-29T16:01:40Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Understanding Unnatural Questions Improves Reasoning over Text [54.235828149899625]
Complex question answering (CQA) over raw text is a challenging task.
Learning an effective CQA model requires large amounts of human-annotated data.
We address the challenge of learning a high-quality programmer (parser) by projecting natural human-generated questions into unnatural machine-generated questions.
arXiv Detail & Related papers (2020-10-19T10:22:16Z) - Information-theoretic User Interaction: Significant Inputs for Program
Synthesis [11.473616777800318]
We introduce the em significant questions problem, and show that it is hard in general.
We develop an information-theoretic greedy approach for solving the problem.
In the context of interactive program synthesis, we use the above result to develop an emactive program learner
Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset.
arXiv Detail & Related papers (2020-06-22T21:46:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.