Related papers: Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents

Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents

URL: http://arxiv.org/abs/2511.15074v1
Date: Wed, 19 Nov 2025 03:27:14 GMT
Title: Knowledge-Informed Automatic Feature Extraction via Collaborative Large Language Model Agents
Authors: Henrik Bradland, Morten Goodwin, Vladimir I. Zadorozhny, Per-Arne Andersen,
Abstract summary: Rogue One is a novel, multi-agent framework for knowledge-informed automatic feature extraction.<n>We demonstrate that Rogue One significantly outperforms state-of-the-art methods on a comprehensive suite of 19 classification and 9 regression datasets.
Score: 3.913122709822389
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The performance of machine learning models on tabular data is critically dependent on high-quality feature engineering. While Large Language Models (LLMs) have shown promise in automating feature extraction (AutoFE), existing methods are often limited by monolithic LLM architectures, simplistic quantitative feedback, and a failure to systematically integrate external domain knowledge. This paper introduces Rogue One, a novel, LLM-based multi-agent framework for knowledge-informed automatic feature extraction. Rogue One operationalizes a decentralized system of three specialized agents-Scientist, Extractor, and Tester-that collaborate iteratively to discover, generate, and validate predictive features. Crucially, the framework moves beyond primitive accuracy scores by introducing a rich, qualitative feedback mechanism and a "flooding-pruning" strategy, allowing it to dynamically balance feature exploration and exploitation. By actively incorporating external knowledge via an integrated retrieval-augmented (RAG) system, Rogue One generates features that are not only statistically powerful but also semantically meaningful and interpretable. We demonstrate that Rogue One significantly outperforms state-of-the-art methods on a comprehensive suite of 19 classification and 9 regression datasets. Furthermore, we show qualitatively that the system surfaces novel, testable hypotheses, such as identifying a new potential biomarker in the myocardial dataset, underscoring its utility as a tool for scientific discovery.

Related papers

FAMOSE: A ReAct Approach to Automated Feature Discovery [4.979045992768399]
FAMOSE is an agentic ReAct framework that autonomously explores, generate, and refine features.<n>It achieves the state-of-the-art for regression tasks by reducing RMSE by 2.0% on average, while remaining more robust to errors than other algorithms.<n>Our work offers evidence that AI agents are remarkably effective in solving problems that require highly inventive solutions.
arXiv Detail & Related papers (2026-02-19T18:53:15Z)
An Agentic Framework for Autonomous Materials Computation [70.24472585135929]
Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery.<n>Recent advances integrate LLMs into agentic frameworks, enabling retrieval, reasoning, and tool use for complex scientific experiments.<n>Here, we present a domain-specialized agent designed for reliable automation of first-principles materials computations.
arXiv Detail & Related papers (2025-12-22T15:03:57Z)
FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data [7.129004248608012]
Event log data represents one of the most valuable assets for modern digital services.<n>Existing automatic feature engineering approaches, such as AutoML or genetic methods, often suffer from limited explainability.<n>We propose FELA, a multi-agent evolutionary system that autonomously extracts meaningful and high-performing features from complex industrial event log data.
arXiv Detail & Related papers (2025-10-29T06:57:32Z)
Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis [23.28226188948918]
ARIA is a spec-driven, human-in-the-loop framework for automated and interpretable data analysis.<n>ARIA integrates six layers, namely Command, Context, Code, Data, Orchestration, and AI Module.<n>ARIA establishes a new paradigm for transparent, collaborative, and reproducible scientific discovery.
arXiv Detail & Related papers (2025-10-13T08:32:43Z)
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning [73.91893534088798]
WebSailor is a complete post-training methodology designed to instill this crucial capability.<n>Our approach involves generating novel, high-uncertainty tasks through structured sampling and information obfuscation.<n>WebSailor significantly outperforms all open-source agents in complex information-seeking tasks.
arXiv Detail & Related papers (2025-09-16T17:57:03Z)
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z)
MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection [84.75972919995398]
This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles.<n>The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent, and (iv) a web-scraped data analyzer.<n>Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches.
arXiv Detail & Related papers (2025-08-13T19:14:48Z)
AutoMind: Adaptive Knowledgeable Agent for Automated Data Science [70.33796196103499]
Large Language Model (LLM) agents have shown great potential in addressing real-world data science problems.<n>Existing frameworks depend on rigid, pre-defined and inflexible coding strategies.<n>We introduce AutoMind, an adaptive, knowledgeable LLM-agent framework.
arXiv Detail & Related papers (2025-06-12T17:59:32Z)
Probing Ranking LLMs: A Mechanistic Analysis for Information Retrieval [20.353393773305672]
We employ a probing-based analysis to examine neuron activations in ranking LLMs.<n>Our study spans a broad range of feature categories, including lexical signals, document structure, query-document interactions, and complex semantic representations.<n>Our findings offer crucial insights for developing more transparent and reliable retrieval systems.
arXiv Detail & Related papers (2024-10-24T08:20:10Z)
AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA. It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z)
Unifying Self-Supervised Clustering and Energy-Based Models [9.3176264568834]
We establish a principled connection between self-supervised learning and generative models.<n>We show that our solution can be integrated into a neuro-symbolic framework to tackle a simple yet non-trivial instantiation of the symbol grounding problem.
arXiv Detail & Related papers (2023-12-30T04:46:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.