Related papers: Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems

Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems

URL: http://arxiv.org/abs/2307.03966v1
Date: Sat, 8 Jul 2023 12:35:10 GMT
Title: Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems
Authors: Nischal Ashok Kumar, Nitin Gupta, Shanmukha Guttula, Hima Patel
Abstract summary: Programming by Example (PBE) is a technique that targets automatic inferencing of a computer program to accomplish a format or string conversion task from user-provided input and output samples. In this paper, we propose a deep neural network based ambiguity prediction model, which analyzes the input-output strings and maps them to a different set of properties responsible for multiple intent.
Score: 3.265146857386153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In mapping enterprise applications, data mapping remains a fundamental part of integration development, but its time consuming. An increasing number of applications lack naming standards, and nested field structures further add complexity for the integration developers. Once the mapping is done, data transformation is the next challenge for the users since each application expects data to be in a certain format. Also, while building integration flow, developers need to understand the format of the source and target data field and come up with transformation program that can change data from source to target format. The problem of automatic generation of a transformation program through program synthesis paradigm from some specifications has been studied since the early days of Artificial Intelligence (AI). Programming by Example (PBE) is one such kind of technique that targets automatic inferencing of a computer program to accomplish a format or string conversion task from user-provided input and output samples. To learn the correct intent, a diverse set of samples from the user is required. However, there is a possibility that the user fails to provide a diverse set of samples. This can lead to multiple intents or ambiguity in the input and output samples. Hence, PBE systems can get confused in generating the correct intent program. In this paper, we propose a deep neural network based ambiguity prediction model, which analyzes the input-output strings and maps them to a different set of properties responsible for multiple intent. Users can analyze these properties and accordingly can provide new samples or modify existing samples which can help in building a better PBE system for mapping enterprise applications.

Related papers

OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs [62.68905180014956]
We introduce OpenCodeInstruct, the largest open-access instruction tuning dataset, comprising 5 million diverse samples. Each sample includes a programming question, solution, test cases, execution feedback, and LLM-generated quality assessments. We fine-tune various base models, including LLaMA and Qwen, across multiple scales (1B+, 3B+, and 7B+) using our dataset.
arXiv Detail & Related papers (2025-04-05T02:52:16Z)
Conformal Prediction Sets for Deep Generative Models via Reduction to Conformal Regression [7.972619160216404]
We consider the problem of generating valid and small prediction sets from a black-box deep generative model for a given input. We develop a simple and effective conformal inference algorithm referred to as Generative Prediction Sets (GPS) The key insight behind GPS is to exploit the inherent structure within the distribution over the minimum number of samples needed to obtain an admissible output.
arXiv Detail & Related papers (2025-03-13T16:16:23Z)
Quantitative Assurance and Synthesis of Controllers from Activity Diagrams [4.419843514606336]
Probabilistic model checking is a widely used formal verification technique to automatically verify qualitative and quantitative properties. This makes it not accessible for researchers and engineers who may not have the required knowledge. We propose a comprehensive verification framework for ADs, including a new profile for probability time, quality annotations, a semantics interpretation of ADs in three Markov models, and a set of transformation rules from activity diagrams to the PRISM language. Most importantly, we developed algorithms for transformation and implemented them in a tool, called QASCAD, using model-based techniques, for fully automated verification.
arXiv Detail & Related papers (2024-02-29T22:40:39Z)
Modelling Concurrency Bugs Using Machine Learning [0.0]
This project aims to compare both common and recent machine learning approaches. We define a synthetic dataset that we generate with the scope of simulating real-life (concurrent) programs. We formulate hypotheses about fundamental limits of various machine learning model types.
arXiv Detail & Related papers (2023-05-08T17:30:24Z)
PEOPL: Characterizing Privately Encoded Open Datasets with Public Labels [59.66777287810985]
We introduce information-theoretic scores for privacy and utility, which quantify the average performance of an unfaithful user. We then theoretically characterize primitives in building families of encoding schemes that motivate the use of random deep neural networks.
arXiv Detail & Related papers (2023-03-31T18:03:53Z)
Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models. We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift. We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z)
EGG-GAE: scalable graph neural networks for tabular data imputation [8.775728170359024]
We propose a novel EdGe Generation Graph AutoEncoder (EGG-GAE) for missing data imputation. EGG-GAE works on randomly sampled mini-batches of the input data, and it automatically infers the best connectivity across the mini-batch for each architecture layer.
arXiv Detail & Related papers (2022-10-19T10:26:17Z)
Conditional Generation with a Question-Answering Blueprint [84.95981645040281]
We advocate planning as a useful intermediate representation for rendering conditional generation less opaque and more grounded. We obtain blueprints automatically by exploiting state-of-the-art question generation technology. We develop Transformer-based models, each varying in how they incorporate the blueprint in the generated output.
arXiv Detail & Related papers (2022-07-01T13:10:19Z)
BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning [93.38239238988719]
We propose to enable deep neural networks with the ability to learn the sample relationships from each mini-batch. BatchFormer is applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications.
arXiv Detail & Related papers (2022-03-03T05:31:33Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
Visual Neural Decomposition to Explain Multivariate Data Sets [13.117139248511783]
Investigating relationships between variables in multi-dimensional data sets is a common task for data analysts and engineers. We propose a novel approach to visualize correlations between input variables and a target output variable that scales to hundreds of variables.
arXiv Detail & Related papers (2020-09-11T15:53:37Z)
Information-theoretic User Interaction: Significant Inputs for Program Synthesis [11.473616777800318]
We introduce the em significant questions problem, and show that it is hard in general. We develop an information-theoretic greedy approach for solving the problem. In the context of interactive program synthesis, we use the above result to develop an emactive program learner Our active learner is able to tradeoff false negatives for false positives and converge in a small number of iterations on a real-world dataset.
arXiv Detail & Related papers (2020-06-22T21:46:40Z)
Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv Detail & Related papers (2019-12-27T21:28:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.