Breaking Type Safety in Go: An Empirical Study on the Usage of the
unsafe Package
- URL: http://arxiv.org/abs/2006.09973v4
- Date: Thu, 22 Jul 2021 18:51:48 GMT
- Title: Breaking Type Safety in Go: An Empirical Study on the Usage of the
unsafe Package
- Authors: Diego Elias Costa, Suhaib Mujahid, Rabe Abdalkareem, Emad Shihab
- Abstract summary: We present the first large-scale study on the usage of the unsafe package in 2,438 popular Go projects.
Our investigation shows that unsafe is used in 24% of Go projects, motivated primarily by communicating with operating systems and C code.
We report a series of real issues faced by projects that use unsafe, from crashing errors and non-deterministic behavior to having their deployment restricted.
- Score: 3.548075273599941
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A decade after its first release, the Go programming language has become a
major programming language in the development landscape. While praised for its
clean syntax and C-like performance, Go also contains a strong static
type-system that prevents arbitrary type casting and arbitrary memory access,
making the language type-safe by design. However, to give developers the
possibility of implementing low-level code, Go ships with a special package
called unsafe that offers developers a way around the type-safety of Go
programs. The package gives greater flexibility to developers but comes at a
higher risk of runtime errors, chances of non-portability, and the loss of
compatibility guarantees for future versions of Go.
In this paper, we present the first large-scale study on the usage of the
unsafe package in 2,438 popular Go projects. Our investigation shows that
unsafe is used in 24% of Go projects, motivated primarily by communicating with
operating systems and C code, but is also commonly used as a source of
performance optimization. Developers are willing to use unsafe to break
language specifications (e.g., string immutability) for better performance and
6% of analyzed projects that use unsafe perform risky pointer conversions that
can lead to program crashes and unexpected behavior. Furthermore, we report a
series of real issues faced by projects that use unsafe, from crashing errors
and non-deterministic behavior to having their deployment restricted from
certain popular environments. Our findings can be used to understand how and
why developers break type-safety in Go, and help motivate further tools and
language development that could make the usage of unsafe in Go even safer.
Related papers
- SJMalloc: the security-conscious, fast, thread-safe and memory-efficient heap allocator [0.0]
Heap-based exploits pose a significant threat to application security.
hardened allocators have not been widely adopted in real-world applications.
SJMalloc stores its metadata out-of-band, away from the application's data on the heap.
SJMalloc demonstrates a 6% performance improvement compared to GLibcs allocator, while using only 5% more memory.
arXiv Detail & Related papers (2024-10-23T14:47:12Z) - Characterizing Unsafe Code Encapsulation In Real-world Rust Systems [2.285834282327349]
Interior unsafe is an essential design paradigm advocated by the Rust community in system software development.
The Rust compiler is incapable of verifying the soundness of a safe function containing unsafe code.
We propose a novel unsafety isolation graph to model the essential usage and encapsulation of unsafe code.
arXiv Detail & Related papers (2024-06-12T06:59:51Z) - Bringing Rust to Safety-Critical Systems in Space [1.0742675209112622]
Rust aims to drastically reduce the chance of introducing bugs and produces overall more secure and safer code.
This work provides a set of recommendations for the development of safety-critical space systems in Rust.
arXiv Detail & Related papers (2024-05-28T12:48:47Z) - A Mixed-Methods Study on the Implications of Unsafe Rust for Interoperation, Encapsulation, and Tooling [2.2463451968497425]
Rust developers need verification tools that can provide guarantees of soundness within multi-language applications.
We study how developers reason about foreign function calls, the limitations of the tools that they currently use, their motivations for using unsafe code, and how they reason about encapsulating it.
arXiv Detail & Related papers (2024-04-02T18:36:21Z) - CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion [117.178835165855]
This paper introduces CodeAttack, a framework that transforms natural language inputs into code inputs.
Our studies reveal a new and universal safety vulnerability of these models against code input.
We find that a larger distribution gap between CodeAttack and natural language leads to weaker safety generalization.
arXiv Detail & Related papers (2024-03-12T17:55:38Z) - GWP-ASan: Sampling-Based Detection of Memory-Safety Bugs in Production [30.534320345970286]
heap-use-after-free and heap-buffer-overflow bugs remain the primary problem for security, reliability, and developer productivity for applications written in C or C++.
This paper describes a family of tools that detect these two classes of memory-safety bugs, while running in production, at near-zero overhead.
arXiv Detail & Related papers (2023-11-15T21:41:53Z) - All Languages Matter: On the Multilingual Safety of Large Language Models [96.47607891042523]
We build the first multilingual safety benchmark for large language models (LLMs)
XSafety covers 14 kinds of commonly used safety issues across 10 languages that span several language families.
We propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT.
arXiv Detail & Related papers (2023-10-02T05:23:34Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - CodeLMSec Benchmark: Systematically Evaluating and Finding Security
Vulnerabilities in Black-Box Code Language Models [58.27254444280376]
Large language models (LLMs) for automatic code generation have achieved breakthroughs in several programming tasks.
Training data for these models is usually collected from the Internet (e.g., from open-source repositories) and is likely to contain faults and security vulnerabilities.
This unsanitized training data can cause the language models to learn these vulnerabilities and propagate them during the code generation procedure.
arXiv Detail & Related papers (2023-02-08T11:54:07Z) - SafeText: A Benchmark for Exploring Physical Safety in Language Models [62.810902375154136]
We study commonsense physical safety across various models designed for text generation and commonsense reasoning tasks.
We find that state-of-the-art large language models are susceptible to the generation of unsafe text and have difficulty rejecting unsafe advice.
arXiv Detail & Related papers (2022-10-18T17:59:31Z) - Natural Language to Code Translation with Execution [82.52142893010563]
Execution result--minimum Bayes risk decoding for program selection.
We show that it improves the few-shot performance of pretrained code models on natural-language-to-code tasks.
arXiv Detail & Related papers (2022-04-25T06:06:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.