Measuring and Mitigating Constraint Violations of In-Context Learning
for Utterance-to-API Semantic Parsing
- URL: http://arxiv.org/abs/2305.15338v1
- Date: Wed, 24 May 2023 16:50:36 GMT
- Title: Measuring and Mitigating Constraint Violations of In-Context Learning
for Utterance-to-API Semantic Parsing
- Authors: Shufan Wang, Sebastien Jean, Sailik Sengupta, James Gung, Nikolaos
Pappas, Yi Zhang
- Abstract summary: In this work, we measure, analyze and mitigate constraints violations in task-oriented semantic parsing.
We investigate two mitigation strategies: Semantic-Retrieval of Demonstrations (SRD) and API-aware Constrained Decoding (API-CD)
Our experiments show that these strategies are effective at reducing constraints violations and improving the quality of the generated API calls, but require careful consideration given their implementation complexity and latency.
- Score: 15.957744324299869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In executable task-oriented semantic parsing, the system aims to translate
users' utterances in natural language to machine-interpretable programs (API
calls) that can be executed according to pre-defined API specifications. With
the popularity of Large Language Models (LLMs), in-context learning offers a
strong baseline for such scenarios, especially in data-limited regimes.
However, LLMs are known to hallucinate and therefore pose a formidable
challenge in constraining generated content. Thus, it remains uncertain if LLMs
can effectively perform task-oriented utterance-to-API generation where
respecting API's structural and task-specific constraints is crucial.
In this work, we seek to measure, analyze and mitigate such constraints
violations. First, we identify the categories of various constraints in
obtaining API-semantics from task-oriented utterances, and define fine-grained
metrics that complement traditional ones. Second, we leverage these metrics to
conduct a detailed error analysis of constraints violations seen in
state-of-the-art LLMs, which motivates us to investigate two mitigation
strategies: Semantic-Retrieval of Demonstrations (SRD) and API-aware
Constrained Decoding (API-CD). Our experiments show that these strategies are
effective at reducing constraints violations and improving the quality of the
generated API calls, but require careful consideration given their
implementation complexity and latency.
Related papers
- Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making [85.24399869971236]
We aim to evaluate Large Language Models (LLMs) for embodied decision making.
Existing evaluations tend to rely solely on a final success rate.
We propose a generalized interface (Embodied Agent Interface) that supports the formalization of various types of tasks.
arXiv Detail & Related papers (2024-10-09T17:59:00Z) - Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models [59.970391602080205]
This study investigates whether such constraints on generation space impact LLMs abilities, including reasoning and domain knowledge comprehension.
We evaluate LLMs performance when restricted to adhere to structured formats versus generating free-form responses across various common tasks.
We find that stricter format constraints generally lead to greater performance degradation in reasoning tasks.
arXiv Detail & Related papers (2024-08-05T13:08:24Z) - FANTAstic SEquences and Where to Find Them: Faithful and Efficient API Call Generation through State-tracked Constrained Decoding and Reranking [57.53742155914176]
API call generation is the cornerstone of large language models' tool-using ability.
Existing supervised and in-context learning approaches suffer from high training costs, poor data efficiency, and generated API calls that can be unfaithful to the API documentation and the user's request.
We propose an output-side optimization approach called FANTASE to address these limitations.
arXiv Detail & Related papers (2024-07-18T23:44:02Z) - LaSagnA: Language-based Segmentation Assistant for Complex Queries [39.620806493454616]
Large Language Models for Vision (vLLMs) generate detailed perceptual outcomes, including bounding boxes and masks.
In this study, we acknowledge that the main cause of these problems is the insufficient complexity of training queries.
We present three novel strategies to effectively handle the challenges arising from the direct integration of the proposed format.
arXiv Detail & Related papers (2024-04-12T14:40:45Z) - Beyond Text: Unveiling Multimodal Proficiency of Large Language Models
with MultiAPI Benchmark [11.572835837392867]
This study introduces MultiAPI, a pioneering comprehensive large-scale API benchmark dataset.
It consists of 235 diverse API calls and 2,038 contextual prompts, offering a unique platform evaluation of tool-augmented LLMs handling multimodal tasks.
Our findings reveal that while LLMs demonstrate proficiency in API call decision-making, they face challenges in domain identification, function selection, and argument generation.
arXiv Detail & Related papers (2023-11-21T23:26:05Z) - TPTU-v2: Boosting Task Planning and Tool Usage of Large Language
Model-based Agents in Real-world Systems [25.854559300612184]
This paper introduces a comprehensive framework aimed at enhancing the Task Planning and Tool Usage (TPTU) abilities of Large Language Models (LLMs)
The framework comprises three key components designed to address these challenges: (1) the API Retriever selects the most pertinent APIs for the user task among the extensive array available; (2) LLM Finetuner tunes a base LLM so that the finetuned LLM can be more capable for task planning and API calling; and (3) the Demo Selector adaptively retrieves different demonstrations related to hard-to-distinguish APIs.
arXiv Detail & Related papers (2023-11-19T12:37:30Z) - When does In-context Learning Fall Short and Why? A Study on
Specification-Heavy Tasks [54.71034943526973]
In-context learning (ICL) has become the default method for using large language models (LLMs)
We find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications.
We identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability.
arXiv Detail & Related papers (2023-11-15T14:26:30Z) - How You Prompt Matters! Even Task-Oriented Constraints in Instructions Affect LLM-Generated Text Detection [39.254432080406346]
Even task-oriented constraints -- constraints that would naturally be included in an instruction and are not related to detection-evasion -- cause existing powerful detectors to have a large variance in detection performance.
Our experiments show that the standard deviation (SD) of current detector performance on texts generated by an instruction with such a constraint is significantly larger (up to an SD of 14.4 F1-score) than that by generating texts multiple times or paraphrasing the instruction.
arXiv Detail & Related papers (2023-11-14T18:32:52Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - OverPrompt: Enhancing ChatGPT through Efficient In-Context Learning [49.38867353135258]
We propose OverPrompt, leveraging the in-context learning capability of LLMs to handle multiple task inputs.
Our experiments show that OverPrompt can achieve cost-efficient zero-shot classification without causing significant detriment to task performance.
arXiv Detail & Related papers (2023-05-24T10:08:04Z) - Mixture of Soft Prompts for Controllable Data Generation [21.84489422361048]
Mixture of Soft Prompts (MSP) is proposed as a tool for data augmentation rather than direct prediction.
Our method achieves state-of-the-art results on three benchmarks when compared against strong baselines.
arXiv Detail & Related papers (2023-03-02T21:13:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.