ACT: Agentic Classification Tree
- URL: http://arxiv.org/abs/2509.26433v2
- Date: Wed, 22 Oct 2025 09:12:00 GMT
- Title: ACT: Agentic Classification Tree
- Authors: Vincent Grari, Tim Arni, Thibault Laugel, Sylvain Lamprier, James Zou, Marcin Detyniecki,
- Abstract summary: We present the Agentic Classification Tree (ACT), which extends decision-tree methodology to unstructured inputs by formulating each split as a natural-language question.<n> Experiments on text benchmarks show that ACT matches or surpasses prompting-based baselines while producing transparent and interpretable decision paths.
- Score: 33.65390081055222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When used in high-stakes settings, AI systems are expected to produce decisions that are transparent, interpretable, and auditable, a requirement increasingly expected by regulations. Decision trees such as CART provide clear and verifiable rules, but they are restricted to structured tabular data and cannot operate directly on unstructured inputs such as text. In practice, large language models (LLMs) are widely used for such data, yet prompting strategies such as chain-of-thought or prompt optimization still rely on free-form reasoning, limiting their ability to ensure trustworthy behaviors. We present the Agentic Classification Tree (ACT), which extends decision-tree methodology to unstructured inputs by formulating each split as a natural-language question, refined through impurity-based evaluation and LLM feedback via TextGrad. Experiments on text benchmarks show that ACT matches or surpasses prompting-based baselines while producing transparent and interpretable decision paths.
Related papers
- A System for Name and Address Parsing with Large Language Models [0.0]
This paper introduces a prompt-driven, validation-centered framework that converts free-text records into a consistent 17-field schema without fine-tuning.<n> Evaluations on heterogeneous real-world address data show high field-level accuracy, strong schema adherence, and stable confidence calibration.
arXiv Detail & Related papers (2026-01-25T22:19:47Z) - Deriving Character Logic from Storyline as Codified Decision Trees [67.01182739162142]
Role-playing (RP) agents rely on behavioral profiles to act consistently across diverse narrative contexts.<n>We propose Codified Decision Trees (CDT), a data-driven framework that induces an executable and interpretable decision structure from large-scale narrative data.
arXiv Detail & Related papers (2026-01-15T05:12:43Z) - Table-BiEval: A Self-Supervised, Dual-Track Framework for Decoupling Structure and Content in LLM Evaluation [11.450834626205676]
Table-BiEval is a novel approach based on a human-free, self-supervised evaluation framework.<n>It calculates Content Semantic Accuracy and Normalized Tree Edit Distance to decouple structure from content.<n>Results reveal substantial variability, highlighting that mid-sized models can surprisingly outperform larger counterparts in structural efficiency.
arXiv Detail & Related papers (2026-01-09T07:38:27Z) - Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration [0.0]
Rule-based reasoning arises in domains where decisions must be auditable and justifiable.<n>Applying rules to such inputs demands both interpretive flexibility and formal guarantees.<n>This paper presents an integration pattern that combines these strengths.
arXiv Detail & Related papers (2026-01-04T17:19:20Z) - KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering [64.62317305868264]
We present textbfKBQA-R1, a framework that shifts the paradigm from text imitation to interaction optimization via Reinforcement Learning.<n>Treating KBQA as a multi-turn decision process, our model learns to navigate the knowledge base using a list of actions.<n>Experiments on WebQSP, GrailQA, and GraphQuestions demonstrate that KBQA-R1 achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-10T17:45:42Z) - Normalisation of SWIFT Message Counterparties with Feature Extraction and Clustering [0.0]
We propose a hybrid string similarity, topic modelling, hierarchical clustering and rule-based pipeline to facilitate clustering of transaction counterparties.<n>The approach retains most of the interpretability found in rule-based systems, as the former adds an additional level of cluster refinement to the latter.<n>When only a subset of the population needs to be investigated, such as in sanctions investigations, the approach allows for better control of the risks of missing entity variations.
arXiv Detail & Related papers (2025-08-24T12:41:44Z) - DecisionFlow: Advancing Large Language Model as Principled Decision Maker [49.088778182807395]
DecisionFlow is a novel decision modeling framework that guides models to reason over structured representations of actions, attributes, and constraints.<n>Rather than predicting answers directly from prompts, DecisionFlow builds a semantically grounded decision space and infers a latent utility function.<n> Empirical results show that DecisionFlow achieves up to 30% accuracy gains over strong prompting baselines.
arXiv Detail & Related papers (2025-05-27T16:23:53Z) - AGENT-X: Adaptive Guideline-based Expert Network for Threshold-free AI-generated teXt detection [44.66668435489055]
AGENT-X is a zero-shot multi-agent framework for AI-generated text detection.<n>We organize detection guidelines into semantic, stylistic, and structural dimensions, each independently evaluated by specialized linguistic agents.<n>A meta agent integrates these assessments through confidence-aware aggregation, enabling threshold-free, interpretable classification.<n>Experiments on diverse datasets demonstrate that AGENT-X substantially surpasses state-of-the-art supervised and zero-shot approaches in accuracy, interpretability, and generalization.
arXiv Detail & Related papers (2025-05-21T08:39:18Z) - Transparent NLP: Using RAG and LLM Alignment for Privacy Q&A [15.86510147965235]
General Data Protection Regulation requires precise processing information to be clear and accessible.<n>This paper examines state-of-the-art Retrieval Generation (RAG) systems enhanced with alignment techniques to fulfill obligations.
arXiv Detail & Related papers (2025-02-10T16:42:00Z) - Decoding AI Judgment: How LLMs Assess News Credibility and Bias [33.7054351451505]
Large Language Models (LLMs) are increasingly embedded in that involve evaluative processes.<n>This raises the need to examine how such evaluations are built, what assumptions they rely on, and how their strategies diverge from those of humans.<n>We benchmark six LLMs against expert ratings--NewsGuard and Media Bias/Fact Check (MBFC)--and against human judgments collected through a controlled experiment.
arXiv Detail & Related papers (2025-02-06T18:52:10Z) - StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs [78.84060166851805]
StructTest is a novel benchmark that evaluates large language models (LLMs) on their ability to follow compositional instructions and generate structured outputs.<n> Assessments are conducted deterministically using a rule-based evaluator, which can be easily extended to new tasks and datasets.<n>We demonstrate that StructTest remains challenging even for top-performing models like Deepseek-V3/R1 and GPT-4o.
arXiv Detail & Related papers (2024-12-23T22:08:40Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models [50.15455336684986]
We evaluate the effectiveness of LogProbs and basic prompting to measure semantic plausibility.
We find that LogProbs offers a more reliable measure of semantic plausibility than direct zero-shot prompting.
We conclude that, even in the era of prompt-based evaluations, LogProbs constitute a useful metric of semantic plausibility.
arXiv Detail & Related papers (2024-03-21T22:08:44Z) - RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules [30.239044569301534]
Weakly supervised text classification (WSTC) has attracted increasing attention due to its applicability in classifying a mass of texts.
We propose a prompting PLM-based approach named RulePrompt for the WSTC task, consisting of a rule mining module and a rule-enhanced pseudo label generation module.
Our approach yields interpretable category rules, proving its advantage in disambiguating easily-confused categories.
arXiv Detail & Related papers (2024-03-05T12:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.