Related papers: ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages

ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages

URL: http://arxiv.org/abs/2407.03387v2
Date: Fri, 30 Aug 2024 09:13:50 GMT
Title: ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
Authors: Mehant Kammakomati, Sameer Pimparkhede, Srikanth Tamilselvam, Prince Kumar, Pushpak Bhattacharyya,
Abstract summary: Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks. Code languages that perform excellently for normal code tasks do not perform well when the same languages represent fine-grained constraints. We introduce ConCodeEval, a first-of-its-kind benchmark having two novel tasks for code constraints across five representations.
Score: 35.170835339618414
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work shows Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks in zero- and few-shot settings. While, in the code domain, there is wide usage of constraints in code format to maintain the integrity of code written in Domain-Specific Languages (DSLs) like JSON and YAML which are widely used for system-level programming tasks in enterprises. Given that LLMs are increasingly used for system-level code tasks, evaluating if they can comprehend these code constraints is crucial. However, no work has been done to evaluate their controllability over code constraints. Hence, we introduce ConCodeEval, a first-of-its-kind benchmark having two novel tasks for code constraints across five representations. Our findings suggest that language models struggle with code constraints. Code languages that perform excellently for normal code tasks do not perform well when the same languages represent fine-grained constraints.

Related papers

Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code. We introduce a type-constrained decoding approach that leverages type systems to guide code generation. Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z)
Evaluating Programming Language Confusion [6.462594894731934]
Large Language Models for code (Code LLMs) have gained significant traction in software engineering. These models have demonstrated remarkable capabilities in understanding programming concepts, implementing algorithms, and even bridging different programming languages. Despite these advances, Code LLMs often struggle with programming language confusion--producing code in unintended languages.
arXiv Detail & Related papers (2025-03-17T18:14:15Z)
UniCoder: Scaling Code Large Language Model via Universal Code [40.248836046285014]
We introduce the universal code (UniCode) as the intermediate representation. UniCoder-Instruct comprises natural-language questions, code solutions, and the corresponding universal code. The alignment between the intermediate universal code representation and the final code solution significantly improves the quality of the generated code.
arXiv Detail & Related papers (2024-06-24T08:32:48Z)
DocCGen: Document-based Controlled Code Generation [33.19206322891497]
DocCGen is a framework that can leverage rich knowledge by breaking the NL-to-Code generation task for structured code languages into a two-step process. Our experiments show that DocCGen consistently improves different-sized language models across all six evaluation metrics.
arXiv Detail & Related papers (2024-06-17T08:34:57Z)
CodeGRAG: Bridging the Gap between Natural Language and Programming Language via Graphical Retrieval Augmented Generation [58.84212778960507]
We propose CodeGRAG, a Graphical Retrieval Augmented Code Generation framework to enhance the performance of LLMs. CodeGRAG builds the graphical view of code blocks based on the control flow and data flow of them to fill the gap between programming languages and natural language. Various experiments and ablations are done on four datasets including both the C++ and python languages to validate the hard meta-graph prompt, the soft prompting technique, and the effectiveness of the objectives for pretrained GNN expert.
arXiv Detail & Related papers (2024-05-03T02:48:55Z)
Language Agnostic Code Embeddings [61.84835551549612]
We focus on the cross-lingual capabilities of code embeddings across different programming languages. Code embeddings comprise two distinct components: one deeply tied to the nuances and syntax of a specific language, and the other remaining agnostic to these details. We show that when we isolate and eliminate this language-specific component, we witness significant improvements in downstream code retrieval tasks.
arXiv Detail & Related papers (2023-10-25T17:34:52Z)
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model [58.127534002232096]
This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM. It is specifically designed for code-related tasks with both English and Chinese prompts. CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset.
arXiv Detail & Related papers (2023-10-10T02:38:44Z)
Multilingual Code Co-Evolution Using Large Language Models [45.083171710527985]
Translating code changes from one programming language to another is not the way developers work. Codeditor explicitly models code changes as edit and learns to correlate changes across programming languages. Codeditor outperforms the state-of-the-art approaches by a large margin on all commonly used automatic metrics.
arXiv Detail & Related papers (2023-07-27T16:37:30Z)
CodeT5+: Open Code Large Language Models for Code Understanding and Generation [72.1638273937025]
Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. CodeT5+ is a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning.
arXiv Detail & Related papers (2023-05-13T14:23:07Z)
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval [32.60391966381949]
We introduce xCodeEval, the largest executable multilingual multitask benchmark to date. It features a total of $7$ tasks involving code understanding, generation, translation and retrieval. xCodeEval adopts an execution-based evaluation and offers a multilingual code execution engine, ExecEval.
arXiv Detail & Related papers (2023-03-06T10:08:51Z)
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages [76.93265104421559]
We benchmark code generation from natural language commands extending beyond English. We annotated a total of 896 NL-code pairs in three languages: Spanish, Japanese, and Russian. While the difficulties vary across these three languages, all systems lag significantly behind their English counterparts.
arXiv Detail & Related papers (2022-03-16T04:21:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.