Serializing Java Objects in Plain Code
- URL: http://arxiv.org/abs/2405.11294v2
- Date: Tue, 21 May 2024 08:41:27 GMT
- Title: Serializing Java Objects in Plain Code
- Authors: Julian Wachter, Deepika Tiwari, Martin Monperrus, Benoit Baudry,
- Abstract summary: In managed languages, serialization of objects is typically done in bespoke binary formats such as Protobuf.
Human developers cannot read binary code, and in most cases suffer from noticeable XML or readability limitations.
This is a major issue when objects are meant to be embedded and read in source code, such as in test cases.
Our core idea is toserialize objects observed at runtime in the native syntax of a programming language.
- Score: 10.405775369526006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In managed languages, serialization of objects is typically done in bespoke binary formats such as Protobuf, or markup languages such as XML or JSON. The major limitation of these formats is readability. Human developers cannot read binary code, and in most cases, suffer from the syntax of XML or JSON. This is a major issue when objects are meant to be embedded and read in source code, such as in test cases. To address this problem, we propose plain-code serialization. Our core idea is to serialize objects observed at runtime in the native syntax of a programming language. We realize this vision in the context of Java, and demonstrate a prototype which serializes Java objects to Java source code. The resulting source faithfully reconstructs the objects seen at runtime. Our prototype is called ProDJ and is publicly available. We experiment with ProDJ to successfully plain-code serialize 174,699 objects observed during the execution of 4 open-source Java applications. Our performance measurement shows that the performance impact is not noticeable.
Related papers
- Type-Constrained Code Generation with Language Models [51.03439021895432]
Large language models (LLMs) produce uncompilable output because their next-token inference procedure does not model formal aspects of code.
We introduce a type-constrained decoding approach that leverages type systems to guide code generation.
Our approach reduces compilation errors by more than half and increases functional correctness in code synthesis, translation, and repair tasks.
arXiv Detail & Related papers (2025-04-12T15:03:00Z) - Deserialization Gadget Chains are not a Pathological Problem in Android:an In-Depth Study of Java Gadget Chains in AOSP [40.53819791643813]
Java's Serializable API has a long history of deserialization vulnerabilities, specifically deserialization gadget chains.
We design a gadget chain detection tool optimized for soundness and efficiency.
Running our tool on the Android SDK and 1,200 Android dependencies, in combination with a comprehensive sink dataset, yields no security-critical gadget chains.
arXiv Detail & Related papers (2025-02-12T14:39:30Z) - Generating executable oracles to check conformance of client code to requirements of JDK Javadocs using LLMs [21.06722050714324]
This paper focuses on automation of test oracles for clients of widely used Java libraries, e.g., java.lang and java.util packages.
We use large language models as an enabling technology to embody our insight into a framework for test oracle automation.
arXiv Detail & Related papers (2024-11-04T04:24:25Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - On the Generalizability of Deep Learning-based Code Completion Across Programming Language Versions [12.245958803682505]
Code completion is a key feature of Integrated Development Environments (IDEs)
Modern code completion approaches are often powered by deep learning (DL) models.
Can these models generalize across different language versions?
arXiv Detail & Related papers (2024-03-22T12:05:18Z) - Java JIT Testing with Template Extraction [7.714591709931207]
LeJit is a template-based framework for testing Java just-in-time (JIT) compilers.
We have successfully used LeJit to test a range of popular Java JIT compilers.
arXiv Detail & Related papers (2024-03-17T17:39:27Z) - Seneca: Taint-Based Call Graph Construction for Java Object Deserialization [3.6731536660959985]
We present Seneca, an approach for handling serialization with improved soundness in the context of call graph construction.
We evaluate our approach with respect to soundness, precision, performance, and usefulness in detecting untrusted object deserialization vulnerabilities.
arXiv Detail & Related papers (2023-11-02T02:07:54Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Outline, Then Details: Syntactically Guided Coarse-To-Fine Code
Generation [61.50286000143233]
ChainCoder is a program synthesis language model that generates Python code progressively.
A tailored transformer architecture is leveraged to jointly encode the natural language descriptions and syntactically aligned I/O data samples.
arXiv Detail & Related papers (2023-04-28T01:47:09Z) - CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code [75.08995072899594]
We propose CodeBERTScore: an evaluation metric for code generation.
CodeBERTScore encodes the natural language input preceding the generated code.
We find that CodeBERTScore achieves a higher correlation with human preference and with functional correctness than all existing metrics.
arXiv Detail & Related papers (2023-02-10T22:12:05Z) - CERT: Continual Pre-Training on Sketches for Library-Oriented Code
Generation [46.45445767488915]
We show how to leverage an unlabelled code corpus to train a model for library-oriented code generation.
We craft two benchmarks named PandasEval and NumpyEval to evaluate library-oriented code generation.
arXiv Detail & Related papers (2022-06-14T14:44:34Z) - SAT-Based Extraction of Behavioural Models for Java Libraries with
Collections [0.087024326813104]
Behavioural models are a valuable tool for software verification, testing, monitoring, publishing etc.
They are rarely provided by the software developers and have to be extracted either from the source or from the compiled code.
Most of these approaches rely on the analysis of the compiled bytecode.
We are looking to extract behavioural models in the form of Finite State Machines (FSMs) from the Java source code to ensure that the obtained FSMs can be easily understood by the software developers.
arXiv Detail & Related papers (2022-05-30T17:27:13Z) - AVATAR: A Parallel Corpus for Java-Python Program Translation [77.86173793901139]
Program translation refers to migrating source code from one language to another.
We present AVATAR, a collection of 9,515 programming problems and their solutions written in two popular languages, Java and Python.
arXiv Detail & Related papers (2021-08-26T05:44:20Z) - TopicModel4J: A Java Package for Topic Models [2.519906683279153]
We design and implement a Java package, TopicModel4J, which contains 13 kinds of representative algorithms for fitting topic models.
The package provides an easy-to-use interface for data analysts to run the algorithms, and allow to easily input and output data.
arXiv Detail & Related papers (2020-10-28T02:33:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.