Automatic Prompt Optimization for Dataset-Level Feature Discovery
- URL: http://arxiv.org/abs/2601.13922v1
- Date: Tue, 20 Jan 2026 12:51:03 GMT
- Title: Automatic Prompt Optimization for Dataset-Level Feature Discovery
- Authors: Adrian Cosma, Oleg Szehr, David Kletz, Alessandro Antonucci, Olivier Pelletier,
- Abstract summary: We formulate feature discovery as a dataset-level prompt optimization problem.<n>We propose a multi-agent prompt optimization framework in which language-model agents jointly propose feature definitions, extract feature values, and evaluate feature quality.<n>This formulation departs from prior prompt optimization methods that rely on per-sample supervision and provides a principled mechanism for automatic feature discovery from unstructured text.
- Score: 38.37728428959515
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Feature extraction from unstructured text is a critical step in many downstream classification pipelines, yet current approaches largely rely on hand-crafted prompts or fixed feature schemas. We formulate feature discovery as a dataset-level prompt optimization problem: given a labelled text corpus, the goal is to induce a global set of interpretable and discriminative feature definitions whose realizations optimize a downstream supervised learning objective. To this end, we propose a multi-agent prompt optimization framework in which language-model agents jointly propose feature definitions, extract feature values, and evaluate feature quality using dataset-level performance and interpretability feedback. Instruction prompts are iteratively refined based on this structured feedback, enabling optimization over prompts that induce shared feature sets rather than per-example predictions. This formulation departs from prior prompt optimization methods that rely on per-sample supervision and provides a principled mechanism for automatic feature discovery from unstructured text.
Related papers
- EXACT: Explicit Attribute-Guided Decoding-Time Personalization [11.035465374731563]
EXACT is a new decoding-time personalization that aligns generation with limited pairwise preference feedback.<n>We show that EXACT consistently outperforms strong baselines, including preference modeling accuracy and personalized generation quality.
arXiv Detail & Related papers (2026-02-06T14:53:37Z) - Semantic Prioritization in Visual Counterfactual Explanations with Weighted Segmentation and Auto-Adaptive Region Selection [50.68751788132789]
This study introduces an innovative methodology named as Weighted Semantic Map with Auto-adaptive Candidate Editing Network (WSAE-Net)<n>The generation of the weighted semantic map is designed to maximize the reduction of non-semantic feature units that need to be computed.<n>The auto-adaptive candidate editing sequences are designed to determine the optimal computational order among the feature units to be processed.
arXiv Detail & Related papers (2025-11-17T05:34:10Z) - FeClustRE: Hierarchical Clustering and Semantic Tagging of App Features from User Reviews [0.0]
FeClustRE is a framework integrating hybrid feature extraction, hierarchical clustering with auto-tuning and semantic labelling.<n>We evaluate FeClustRE on public benchmarks for extraction correctness and on a sample study of generative AI assistant app reviews for clustering quality, semantic coherence, and interpretability.
arXiv Detail & Related papers (2025-10-21T16:54:21Z) - CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z) - AttriPrompt: Dynamic Prompt Composition Learning for CLIP [41.37140060183439]
AttriPrompt is a novel framework that enhances and refines textual semantic representations.<n>We introduce a Self-Regularization mechanism by applying explicit regularization constraints between the prompted and non-prompted text features.<n>Experiments demonstrate AttriPrompt's superiority over state-of-the-art methods, achieving up to 7.37% improvement in the base-to-novel setting.
arXiv Detail & Related papers (2025-09-07T07:07:59Z) - Reflection-Enhanced Meta-Optimization Integrating TextGrad-style Prompt Optimization with Memory-Driven Self-Evolution [0.0]
We propose a framework that integrates a memory-augmented Reflection RetrievalRAG module and a Self-Adaptive meta-controller.<n>REMO achieves more stable and robust tuning, albeit at the cost of increased computational overhead.
arXiv Detail & Related papers (2025-08-26T07:25:45Z) - PromptPrism: A Linguistically-Inspired Taxonomy for Prompts [13.169345040931857]
We introduce PromptPrism, a linguistically-inspired taxonomy that enables prompt analysis across three hierarchical levels.<n>We show the practical utility of PromptPrism by applying it to three applications.
arXiv Detail & Related papers (2025-05-19T01:08:26Z) - In-context Demonstration Matters: On Prompt Optimization for Pseudo-Supervision Refinement [71.60563181678323]
Large language models (LLMs) have achieved great success across diverse tasks, and fine-tuning is sometimes needed to further enhance generation quality.<n>To handle these challenges, a direct solution is to generate high-confidence'' data from unsupervised downstream tasks.<n>We propose a novel approach, pseudo-supervised demonstrations aligned prompt optimization (PAPO) algorithm, which jointly refines both the prompt and the overall pseudo-supervision.
arXiv Detail & Related papers (2024-10-04T03:39:28Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Automated Concatenation of Embeddings for Structured Prediction [75.44925576268052]
We propose Automated Concatenation of Embeddings (ACE) to automate the process of finding better concatenations of embeddings for structured prediction tasks.
We follow strategies in reinforcement learning to optimize the parameters of the controller and compute the reward based on the accuracy of a task model.
arXiv Detail & Related papers (2020-10-10T14:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.