Related papers: Unlocking a New Rust Programming Experience: Fast and Slow Thinking with LLMs to Conquer Undefined Behaviors

Related papers

Evaluating and Improving Automated Repository-Level Rust Issue Resolution with LLM-based Agents [7.282281491277909]
Rust programming language presents a steep learning curve and significant coding challenges.<n>Recently, LLM-powered code agents have shown remarkable success in resolving complex software engineering tasks.<n>RUSTFORGER is a novel agentic approach that integrates an automated test environment setup with a Rust metaprogramming-driven dynamic tracing strategy.
arXiv Detail & Related papers (2026-02-26T08:54:09Z)
AkiraRust: Re-thinking LLM-aided Rust Repair Using a Feedback-guided Thinking Switch [25.65238229037917]
AkiraRust is a repair and verification framework that incorporates a finite-state machine to adapt its detection and repair flow to runtime semantic conditions.<n>AkiruRust achieves about 92% semantic correctness and delivers a 2.2x average speedup compared to SOTA.
arXiv Detail & Related papers (2026-02-25T08:34:27Z)
RealSec-bench: A Benchmark for Evaluating Secure Code Generation in Real-World Repositories [58.32028251925354]
Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, but their proficiency in producing secure code remains a critical, under-explored area.<n>We introduce RealSec-bench, a new benchmark for secure code generation meticulously constructed from real-world, high-risk Java repositories.
arXiv Detail & Related papers (2026-01-30T08:29:01Z)
QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code [52.66657751895655]
Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation.<n>This paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation.<n>We propose a self-evolving prompt optimization method that enables LLMs to evolve their internal prompt strategies.
arXiv Detail & Related papers (2025-11-03T03:20:26Z)
TECS/Rust-OE: Optimizing Exclusive Control in Rust-based Component Systems for Embedded Devices [0.0]
TECS/Rust has been proposed as a framework that combines Rust and component-based development (CBD) to enable scalable system design and enhanced reliability.<n>This paper proposes TECS/Rust-OE, a memory-safe CBD framework utilizing call flows to address these limitations.<n>The proposed Rust code leverages real-time OS exclusive control mechanisms, optimizing performance without compromising reusability.
arXiv Detail & Related papers (2025-10-29T07:48:47Z)
InspectCoder: Dynamic Analysis-Enabled Self Repair through interactive LLM-Debugger Collaboration [71.18377595277018]
Large Language Models (LLMs) frequently generate buggy code with complex logic errors that are challenging to diagnose.<n>We present InspectCoder, the first agentic program repair system that empowers LLMs to actively conduct dynamic analysis via interactive debugger control.
arXiv Detail & Related papers (2025-10-21T06:26:29Z)
EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation [16.12483934561206]
EvoC2Rust is an automated framework for converting entire C projects to equivalent Rust ones.<n>Our evaluation on open-source benchmarks and six industrial projects demonstrates EvoC2Rust's superior performance in project-level C-to-Rust translation.
arXiv Detail & Related papers (2025-08-06T10:31:23Z)
deepSURF: Detecting Memory Safety Vulnerabilities in Rust Through Fuzzing LLM-Augmented Harnesses [8.093479682590825]
Rust ensures memory safety by default, but it also permits the use of unsafe code, which can introduce memory safety vulnerabilities if misused.<n>We present deepSURF, a tool that integrates static analysis with Large Language Model (LLM)-guided fuzzing harness generation.<n>We evaluate deepSURF on 27 real-world Rust crates, successfully rediscovering 20 known memory safety bugs and uncovering 6 previously unknown vulnerabilities.
arXiv Detail & Related papers (2025-06-18T17:18:23Z)
Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z)
OSS-Bench: Benchmark Generator for Coding LLMs [4.393587297483245]
We introduce OSS-Bench, a benchmark generator that constructs large-scale, live evaluation tasks from real-world open-source software.<n> OSS-Bench replaces functions with LLM-generated code and evaluates them using three natural metrics: compilability, functional correctness, and memory safety.<n>Our results demonstrate that OSS-Bench mitigates overfitting by leveraging the evolving complexity of OSS.
arXiv Detail & Related papers (2025-05-18T09:53:51Z)
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation [63.23120252801889]
CRUST-Bench is a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting.
arXiv Detail & Related papers (2025-04-21T17:33:33Z)
Fast-Slow-Thinking: Complex Task Solving with Large Language Models [49.98959729052245]
This paper introduces a new task decomposition method termed Fast-Slow-Thinking'' (FST) In FT, LLMs are prompted to remove the constraints of the original task, therefore simplifying it to a general and concise one. In ST, we recall the constraints removed in FT, so that LLMs can improve the answer generated in FT to meet the requirements of the original task.
arXiv Detail & Related papers (2025-04-11T16:57:36Z)
HALURust: Exploiting Hallucinations of Large Language Models to Detect Vulnerabilities in Rust [5.539291692976558]
Since 2018, 442 Rust-related vulnerabilities have been reported in real-world applications. This paper introduces HALURust, a novel framework that leverages hallucinations of large language models (LLMs) to detect vulnerabilities in real-world Rust scenarios. HALURust was evaluated on a dataset of 81 real-world vulnerabilities, covering 447 functions and 18,691 lines of code across 54 applications.
arXiv Detail & Related papers (2025-03-13T18:38:34Z)
Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories [8.583591493627276]
We introduce JitVul, a vulnerability detection benchmark linking each function to its vulnerability-introducing and fixing commits. We show that ReAct Agents, leveraging thought-action-observation and interprocedural context, perform better than LLMs in distinguishing vulnerable from benign code.
arXiv Detail & Related papers (2025-03-05T15:22:24Z)
ToolCoder: A Systematic Code-Empowered Tool Learning Framework for Large Language Models [49.04652315815501]
Tool learning has emerged as a crucial capability for large language models (LLMs) to solve complex real-world tasks through interaction with external tools.<n>We propose ToolCoder, a novel framework that reformulates tool learning as a code generation task.
arXiv Detail & Related papers (2025-02-17T03:42:28Z)
Reference Trustable Decoding: A Training-Free Augmentation Paradigm for Large Language Models [79.41139393080736]
Large language models (LLMs) have rapidly advanced and demonstrated impressive capabilities. In-Context Learning (ICL) and. Efficient Fine-Tuning (PEFT) are currently two mainstream methods for augmenting. LLMs to downstream tasks. We propose Reference Trustable Decoding (RTD), a paradigm that allows models to quickly adapt to new tasks without fine-tuning.
arXiv Detail & Related papers (2024-09-30T10:48:20Z)
Rusty Linux: Advances in Rust for Linux Kernel Development [0.0]
Integration of Rust into kernel development is a transformative endeavor aimed at enhancing system security and reliability. We identify the advantages Rust offers, highlight the challenges faced, and emphasize the need for community consensus on Rust's adoption.
arXiv Detail & Related papers (2024-07-25T23:46:27Z)
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing [63.20133320524577]
We show that editing a small subset of parameters can effectively modulate specific behaviors of large language models (LLMs)<n>Our approach achieves reductions of up to 90.0% in toxicity on the RealToxicityPrompts dataset and 49.2% on ToxiGen.
arXiv Detail & Related papers (2024-07-11T17:52:03Z)
A Study of Undefined Behavior Across Foreign Function Boundaries in Rust Libraries [2.359557447960552]
Rust is frequently used to interoperate with other languages.<n>Miri is the only dynamic analysis tool that can validate applications against these models.<n>Miri does not support finding bugs in foreign functions, indicating that there may be a critical correctness gap across the Rust ecosystem.
arXiv Detail & Related papers (2024-04-17T18:12:05Z)
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping [49.66872823080736]
Autoregressive Large Language Models (e.g., LLaMa, GPTs) are omnipresent achieving remarkable success in language understanding and generation. To mitigate overload incurred during generation, several early-exit and layer-dropping strategies have been proposed. We propose FFN-SkipLLM, which is an input-adaptive feed-forward skipping strategy.
arXiv Detail & Related papers (2024-04-05T02:35:43Z)
Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust [23.0568924498396]
Rust is one of the most promising systems programming languages to solve the memory safety issues that have plagued low-level software for over forty years. unsafe Rust code and directly-linked unsafe foreign libraries may not only introduce memory safety violations themselves but also compromise the entire program as they run in the same monolithic address space as the safe Rust. We have prototyped a whole-program analysis for identifying both unsafe heap allocations and memory accesses to those unsafe heap objects.
arXiv Detail & Related papers (2023-10-16T11:34:21Z)
Fixing Rust Compilation Errors using LLMs [2.1781086368581932]
The Rust programming language has established itself as a viable choice for low-level systems programming language over the traditional, unsafe alternatives like C/C++. This paper presents a tool called RustAssistant that leverages the emergent capabilities of Large Language Models (LLMs) to automatically suggest fixes for Rust compilation errors. RustAssistant is able to achieve an impressive peak accuracy of roughly 74% on real-world compilation errors in popular open-source Rust repositories.
arXiv Detail & Related papers (2023-08-09T18:30:27Z)
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks [81.9962823875981]
We introduce SwiftSage, a novel agent framework inspired by the dual-process theory of human cognition. The framework comprises two primary modules: the Swift module, representing fast and intuitive thinking, and the Sage module, emulating deliberate thought processes. In 30 tasks from the ScienceWorld benchmark, SwiftSage significantly outperforms other methods such as SayCan, ReAct, and Reflex.
arXiv Detail & Related papers (2023-05-27T07:04:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.