Learning Randomized Reductions and Program Properties
        - URL: http://arxiv.org/abs/2412.18134v1
 - Date: Tue, 24 Dec 2024 03:42:53 GMT
 - Title: Learning Randomized Reductions and Program Properties
 - Authors: Ferhat Erata, Orr Paradise, Timos Antonopoulos, ThanhVu Nguyen, Shafi Goldwasser, Ruzica Piskac, 
 - Abstract summary: Bitween is a method and tool for automated learning of randomized (self)-reductions and program properties in numerical programs.<n>We establish a theoretical framework for learning these reductions and introduce RSR-Bench, a benchmark suite for evaluating Bitween's capabilities on scientific and machine learning functions.
 - Score: 12.027016519515477
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   The correctness of computations remains a significant challenge in computer science, with traditional approaches relying on automated testing or formal verification. Self-testing/correcting programs introduce an alternative paradigm, allowing a program to verify and correct its own outputs via randomized reductions, a concept that previously required manual derivation. In this paper, we present Bitween, a method and tool for automated learning of randomized (self)-reductions and program properties in numerical programs. Bitween combines symbolic analysis and machine learning, with a surprising finding: polynomial-time linear regression, a basic optimization method, is not only sufficient but also highly effective for deriving complex randomized self-reductions and program invariants, often outperforming sophisticated mixed-integer linear programming solvers. We establish a theoretical framework for learning these reductions and introduce RSR-Bench, a benchmark suite for evaluating Bitween's capabilities on scientific and machine learning functions. Our empirical results show that Bitween surpasses state-of-the-art tools in scalability, stability, and sample efficiency when evaluated on nonlinear invariant benchmarks like NLA-DigBench. Bitween is open-source as a Python package and accessible via a web interface that supports C language programs. 
 
       
      
        Related papers
        - Machine Learning Pipeline for Software Engineering: A Systematic   Literature Review [0.0]
This systematic literature review examines state-of-the-art Machine Learning pipelines designed for software engineering (SE)<n>Our findings show that robust preprocessing, such as SMOTE for data balancing, improves model reliability.<n> Ensemble methods like Random Forest and Gradient Boosting dominate performance across tasks.<n>New metrics like Best Arithmetic Mean (BAM) are emerging in niche applications.
arXiv  Detail & Related papers  (2025-07-31T15:37:30Z) - ReTool: Reinforcement Learning for Strategic Tool Use in LLMs [27.07998056454784]
ReTool enhances long-form reasoning with tool-integrated learning.
Model achieves 67% accuracy with 400 training steps.
Remarkably, ReTool-32B attains 72.5% accuracy in extended settings.
arXiv  Detail & Related papers  (2025-04-15T18:10:22Z) - Learning Semantics-aware Search Operators for Genetic Programming [0.20718016474717196]
Fitness landscapes in test-based program synthesis are known to be extremely rugged.<n>We propose a semantics-aware search operator that steers the search towards candidate programs that are valuable.
arXiv  Detail & Related papers  (2025-02-06T23:46:04Z) - Formal Verification of Markov Processes with Learned Parameters [2.5694725194040804]
We show that for a broad class of machine learning models, verifying properties of Markov chains can be formulated as a bilinear program.
We show through computational experiments that our method solves the problem to global optimality up to 100x faster than state-of-the-art solvers.
arXiv  Detail & Related papers  (2025-01-27T04:34:22Z) - Evaluation of Artificial Intelligence Methods for Lead Time Prediction   in Non-Cycled Areas of Automotive Production [1.3499500088995464]
The present study examines the effectiveness of applying Artificial Intelligence methods in an automotive production environment.
Data structures are analyzed to identify contextual features and then preprocessed using one-hot encoding.
The research demonstrates that AI methods can be effectively applied to highly variable production data, adding business value.
arXiv  Detail & Related papers  (2025-01-13T13:28:03Z) - Searching Latent Program Spaces [0.0]
We propose an algorithm for program induction that learns a distribution over latent programs in a continuous space, enabling efficient search and test-time adaptation.
We show that can generalize beyond its training distribution and adapt to unseen tasks by utilizing test-time adaptation mechanisms.
arXiv  Detail & Related papers  (2024-11-13T15:50:32Z) - Lessons on Datasets and Paradigms in Machine Learning for Symbolic   Computation: A Case Study on CAD [0.0]
This study reports lessons on the importance of analysing datasets prior to machine learning.
We present results for a particular case study, the selection of variable ordering for cylindrical algebraic decomposition.
We introduce an augmentation technique for systems that allows us to balance and further augment the dataset.
arXiv  Detail & Related papers  (2024-01-24T10:12:43Z) - A Comparative Study of Filters and Deep Learning Models to predict
  Diabetic Retinopathy [0.0]
This study compares the outcomes of various deep learning models, including InceptionNetV3, utilizing a variety of image filters.
The objective is to improve the diagnostic processes for Diabetic Retinopathy (DR), the primary cause of diabetes-related blindness.
arXiv  Detail & Related papers  (2023-09-26T19:21:09Z) - Using Machine Learning To Identify Software Weaknesses From Software
  Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
 Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv  Detail & Related papers  (2023-08-10T13:19:10Z) - Modelling Concurrency Bugs Using Machine Learning [0.0]
This project aims to compare both common and recent machine learning approaches.
We define a synthetic dataset that we generate with the scope of simulating real-life (concurrent) programs.
We formulate hypotheses about fundamental limits of various machine learning model types.
arXiv  Detail & Related papers  (2023-05-08T17:30:24Z) - VCNet: A self-explaining model for realistic counterfactual generation [52.77024349608834]
Counterfactual explanation is a class of methods to make local explanations of machine learning decisions.
We present VCNet-Variational Counter Net, a model architecture that combines a predictor and a counterfactual generator.
We show that VCNet is able to both generate predictions, and to generate counterfactual explanations without having to solve another minimisation problem.
arXiv  Detail & Related papers  (2022-12-21T08:45:32Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
  Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv  Detail & Related papers  (2022-11-16T00:11:04Z) - NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
  Question Answering [52.10214317661547]
Current numerical reasoning methods autoregressively decode program sequences.
The accuracy of program generation drops sharply as the decoding steps unfold due to error propagation.
In this paper, we propose a non-autoregressive program generation framework.
arXiv  Detail & Related papers  (2022-11-07T11:25:21Z) - Learning from Self-Sampled Correct and Partially-Correct Programs [96.66452896657991]
We propose to let the model perform sampling during training and learn from both self-sampled fully-correct programs and partially-correct programs.
We show that our use of self-sampled correct and partially-correct programs can benefit learning and help guide the sampling process.
Our proposed method improves the pass@k performance by 3.1% to 12.3% compared to learning from a single reference program with MLE.
arXiv  Detail & Related papers  (2022-05-28T03:31:07Z) - Predictive Coding Approximates Backprop along Arbitrary Computation
  Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents.
Our models perform equivalently to backprop on challenging machine learning benchmarks.
Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv  Detail & Related papers  (2020-06-07T15:35:47Z) - Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications.
We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv  Detail & Related papers  (2019-12-27T21:28:10Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.