Ambiguity Resolution in Text-to-Structured Data Mapping
- URL: http://arxiv.org/abs/2505.11679v1
- Date: Fri, 16 May 2025 20:39:30 GMT
- Title: Ambiguity Resolution in Text-to-Structured Data Mapping
- Authors: Zhibo Hu, Chen Wang, Yanfeng Shu, Hye-Young Paik, Liming Zhu,
- Abstract summary: Ambiguity in natural language is a significant obstacle for achieving accurate text to structured data mapping.<n>We propose a new framework to improve the performance of large language models (LLMs) on ambiguous agentic tool calling through missing concepts prediction.
- Score: 10.285528620331696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Ambiguity in natural language is a significant obstacle for achieving accurate text to structured data mapping through large language models (LLMs), which affects the performance of tasks such as mapping text to agentic tool calling and text-to-SQL queries. Existing methods of ambiguity handling either exploit ReACT framework to produce the correct mapping through trial and error, or supervised fine tuning to guide models to produce a biased mapping to improve certain tasks. In this paper, we adopt a different approach that characterizes the representation difference of ambiguous text in the latent space and leverage the difference to identify ambiguity before mapping them to structured data. To detect ambiguity of a sentence, we focused on the relationship between ambiguous questions and their interpretations and what cause the LLM ignore multiple interpretations. Different to the distance calculated by dense embedding vectors, we utilize the observation that ambiguity is caused by concept missing in latent space of LLM to design a new distance measurement, computed through the path kernel by the integral of gradient values for each concepts from sparse-autoencoder (SAE) under each state. We identify patterns to distinguish ambiguous questions with this measurement. Based on our observation, We propose a new framework to improve the performance of LLMs on ambiguous agentic tool calling through missing concepts prediction.
Related papers
- InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition [19.74617806521803]
InstructSAM is a training-free framework for instruction-driven object recognition.<n>We present EarthInstruct, the first InstructCDS benchmark for earth observation.
arXiv Detail & Related papers (2025-05-21T17:59:56Z) - Clarifying Ambiguities: on the Role of Ambiguity Types in Prompting Methods for Clarification Generation [5.259846811078731]
We focus on the concept of ambiguity for clarification, seeking to model and integrate ambiguities in the clarification process.<n>We name this new prompting scheme Ambiguity Type-Chain of Thought (AT-CoT)
arXiv Detail & Related papers (2025-04-16T14:21:02Z) - CLEAR-KGQA: Clarification-Enhanced Ambiguity Resolution for Knowledge Graph Question Answering [13.624962763072899]
KGQA systems typically assume user queries are unambiguous, which is an assumption that rarely holds in real-world applications.<n>We propose a novel framework that dynamically handles both entity ambiguity (e.g., distinguishing between entities with similar names) and intent ambiguity (e.g., clarifying different interpretations of user queries) through interactive clarification.
arXiv Detail & Related papers (2025-04-13T17:34:35Z) - LayerFlow: Layer-wise Exploration of LLM Embeddings using Uncertainty-aware Interlinked Projections [11.252261879736102]
LayerFlow is a visual analytics workspace that displays embeddings in an interlinked projection design.<n>It communicates the transformation, representation, and interpretation uncertainty.<n>We show the usability of the presented workspace through replication and expert case studies.
arXiv Detail & Related papers (2025-04-09T12:24:58Z) - Large Language Model Guided Progressive Feature Alignment for Multimodal UAV Object Detection [21.16636753446158]
Existing multimodal UAV object detection methods often overlook the impact of semantic gaps between modalities.<n>We propose a Large Language Model (LLM) guided Progressive feature Alignment Network called LPANet.<n>We show that our approach outperforms state-of-the-art multimodal UAV object detectors.
arXiv Detail & Related papers (2025-03-10T05:53:30Z) - Disambiguate First Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing [56.82807063333088]
We propose a modular approach that resolves ambiguity using natural language interpretations before mapping these to logical forms.<n>Our approach improves interpretation coverage and generalizes across datasets with different annotation styles, database structures, and ambiguity types.
arXiv Detail & Related papers (2025-02-25T18:42:26Z) - AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries [56.82807063333088]
We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-open programs.
Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness)
In each case, the ambiguity persists even when the database context is provided.
This is achieved through a novel approach that involves controlled generation of databases from scratch.
arXiv Detail & Related papers (2024-06-27T10:43:04Z) - Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z) - Contrastive Instruction Tuning [61.97704869248903]
We propose Contrastive Instruction Tuning to maximize the similarity between semantically equivalent instruction-instance pairs.
Experiments on the PromptBench benchmark show that CoIN consistently improves LLMs' robustness to unseen instructions with variations across character, word, sentence, and semantic levels by an average of +2.5% in accuracy.
arXiv Detail & Related papers (2024-02-17T00:09:32Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z) - SUN: Exploring Intrinsic Uncertainties in Text-to-SQL Parsers [61.48159785138462]
This paper aims to improve the performance of text-to-dependence by exploring the intrinsic uncertainties in the neural network based approaches (called SUN)
Extensive experiments on five benchmark datasets demonstrate that our method significantly outperforms competitors and achieves new state-of-the-art results.
arXiv Detail & Related papers (2022-09-14T06:27:51Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.