Related papers: Code Documentation and Analysis to Secure Software Development

Code Documentation and Analysis to Secure Software Development

URL: http://arxiv.org/abs/2407.11934v1
Date: Tue, 16 Jul 2024 17:25:44 GMT
Title: Code Documentation and Analysis to Secure Software Development
Authors: Paul Attie, Anas Obeidat, Nathaniel Oh, Ian Yelle,
Abstract summary: CoDAT is a tool designed to maintain consistency between the various levels of code documentation. It is implemented in the Intellij IDEA. We use a large language model to check the semantic consistency between a fragment of code and the comments that describe it.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the Code Documentation and Analysis Tool (CoDAT). CoDAT is a tool designed to maintain consistency between the various levels of code documentation, e.g. if a line in a code sketch is changed, the comment that documents the corresponding code is also changed. That is, comments are linked and updated so as to remain internally consistent and also consistent with the code. By flagging "out of date" comments, CoDAT alerts the developer to maintain up-to-date documentation. We use a large language model to check the semantic consistency between a fragment of code and the comments that describe it. Thus we also flag semantic inconsistency as well as out of date comments. This helps programers write code that correctly implements a code sketch, and so provides machine support for a step-wise refinement approach, starting with a code sketch and proceeding down to code through one or more refinement iterations. CoDAT is implemented in the Intellij IDEA IDE where we use the Code Insight daemon package alongside a custom regular expression algorithm to mark tagged comments whose corresponding code blocks have changed. CoDAT's backend is structurally decentralized to allow a distributed ledger framework for code consistency and architectural compilation tracking.

Related papers

Codetations: Intelligent, Persistent Notes and UIs for Programs and Other Documents [0.85830154886823]
We present Codetations, a system that helps developers contextualize documents with rich notes and tools. Unlike previous approaches, notes in Codetations stay outside the document to prevent code clutter, attaching to spans in the document using a hybrid edit-tracking/LLM-based method. Their content is dynamic, interactive, and synchronized with code changes.
arXiv Detail & Related papers (2025-04-25T21:33:25Z)
Building A Coding Assistant via the Retrieval-Augmented Language Model [24.654428111628242]
We propose a retrieval-augmeNted language model (CONAN) to build a code assistant by mimicking the knowledge-seeking behaviors of humans during coding. It consists of a code structure aware retriever (CONAN-R) and a dual-view code representation-based retrieval-augmented generation model (CONAN-G)
arXiv Detail & Related papers (2024-10-21T17:34:39Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
When simplicity meets effectiveness: Detecting code comments coherence with word embeddings and LSTM [6.417777780911223]
Code comments play a crucial role in software development, as they provide programmers with practical information. Developers tend to leave comments unchanged after updating the code, resulting in a discrepancy between the two artifacts. It is crucial to identify if, given a code snippet, its corresponding comment is coherent and reflects well the intent behind the code.
arXiv Detail & Related papers (2024-05-25T15:21:27Z)
SparseCoder: Identifier-Aware Sparse Transformer for File-Level Code Summarization [51.67317895094664]
This paper studies file-level code summarization, which can assist programmers in understanding and maintaining large source code projects. We propose SparseCoder, an identifier-aware sparse transformer for effectively handling long code sequences.
arXiv Detail & Related papers (2024-01-26T09:23:27Z)
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback [50.725076393314964]
We introduce InterCode, a lightweight, flexible, and easy-to-use framework of interactive coding as a standard reinforcement learning environment. Our framework is language and platform agnostic, uses self-contained Docker environments to provide safe and reproducible execution. We demonstrate InterCode's viability as a testbed by evaluating multiple state-of-the-art LLMs configured with different prompting strategies.
arXiv Detail & Related papers (2023-06-26T17:59:50Z)
Code Comment Inconsistency Detection with BERT and Longformer [9.378041196272878]
Comments, or natural language descriptions of source code, are standard practice among software developers. When the code is modified without an accompanying correction to the comment, an inconsistency between the comment and code can arise. We propose two models to detect such inconsistencies in a natural language inference (NLI) context.
arXiv Detail & Related papers (2022-07-29T02:43:51Z)
ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval. We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z)
CodeRetriever: Unimodal and Bimodal Contrastive Learning [128.06072658302165]
We propose the CodeRetriever model, which combines the unimodal and bimodal contrastive learning to train function-level code semantic representations. For unimodal contrastive learning, we design a semantic-guided method to build positive code pairs based on the documentation and function name. For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.
arXiv Detail & Related papers (2022-01-26T10:54:30Z)
Deep Just-In-Time Inconsistency Detection Between Comments and Source Code [51.00904399653609]
In this paper, we aim to detect whether a comment becomes inconsistent as a result of changes to the corresponding body of code. We develop a deep-learning approach that learns to correlate a comment with code changes. We show the usefulness of our approach by combining it with a comment update model to build a more comprehensive automatic comment maintenance system.
arXiv Detail & Related papers (2020-10-04T16:49:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.