Related papers: Assessing Code Generation with Intermediate Languages

Assessing Code Generation with Intermediate Languages

URL: http://arxiv.org/abs/2407.05411v1
Date: Sun, 7 Jul 2024 15:35:41 GMT
Title: Assessing Code Generation with Intermediate Languages
Authors: Xun Deng, Sicheng Zhong, Honghua Dong, Jingyu Hu, Sidi Mohamed Beillahi, Xujie Si, Fan Long,
Abstract summary: This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code. Our findings reveal that intermediate languages generally exhibit greater efficacy in larger models that have not yet achieved state-of-the-art performance.
Score: 6.999311675957218
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code, and systematically evaluates their impact on the performance of LLMs in code generation tasks. Our experiments encompass eleven models across the CodeLlama, GPT, and Mistral families, as well as newly released smaller models. Our findings reveal that intermediate languages generally exhibit greater efficacy in larger models that have not yet achieved state-of-the-art performance. Natural language consistently emerges as the most effective intermediate representation across all target languages. However, we observe no universally effective intermediate formal language across different models and target languages. Furthermore, we uncover a weak correlation between the correctness of intermediate solutions and final generation, suggesting that improvements may stem from the chain-of-thought effect rather than language-specific transfer. Interestingly, we discover that for GPT family models, prompting multiple times without explicit self-correction instructions yields performance gains across the studied languages.

Related papers

Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention [71.12193680015622]
Large Language Models (LLMs) have shown remarkable capabilities in natural language processing. LLMs exhibit significant performance gaps among different languages. We propose Inference-Time Cross-Lingual Intervention (INCLINE) to overcome these limitations without incurring significant costs.
arXiv Detail & Related papers (2024-10-16T11:23:03Z)
Boosting the Capabilities of Compact Models in Low-Data Contexts with Large Language Models and Retrieval-Augmented Generation [2.9921619703037274]
We propose a retrieval augmented generation (RAG) framework backed by a large language model (LLM) to correct the output of a smaller model for the linguistic task of morphological glossing. We leverage linguistic information to make up for the lack of data and trainable parameters, while allowing for inputs from written descriptive grammars interpreted and distilled through an LLM. We show that a compact, RAG-supported model is highly effective in data-scarce settings, achieving a new state-of-the-art for this task and our target languages.
arXiv Detail & Related papers (2024-10-01T04:20:14Z)
Interpretability of Language Models via Task Spaces [14.543168558734001]
We present an alternative approach to interpret language models (LMs) We focus on the quality of LM processing, with a focus on their language abilities. We construct 'linguistic task spaces' that shed light on the connections LMs draw between language phenomena.
arXiv Detail & Related papers (2024-06-10T16:34:30Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models [102.00201523306986]
We present L2CEval, a systematic evaluation of the language-to-code generation capabilities of large language models (LLMs) We analyze the factors that potentially affect their performance, such as model size, pretraining data, instruction tuning, and different prompting methods. In addition to assessing model performance, we measure confidence calibration for the models and conduct human evaluations of the output programs.
arXiv Detail & Related papers (2023-09-29T17:57:00Z)
Improving Factuality and Reasoning in Language Models through Multiagent Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer. Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks. Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z)
A Survey of Large Language Models [81.06947636926638]
Language modeling has been widely studied for language understanding and generation in the past two decades. Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora. To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
arXiv Detail & Related papers (2023-03-31T17:28:46Z)
Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks. BCLMs have been shown to be efficient in controllable language generation. We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z)
Pre-training Universal Language Representation [46.51685959045527]
This work introduces universal language representation learning, i.e., embeddings of different levels of linguistic units or text with quite diverse lengths in a uniform vector space. We empirically verify that well designed pre-training scheme may effectively yield universal language representation.
arXiv Detail & Related papers (2021-05-30T09:29:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.