Related papers: Contract2Plan: Verified Contract-Grounded Retrieval-Augmented Optimization for BOM-Aware Procurement and Multi-Echelon Inventory Planning

Contract2Plan: Verified Contract-Grounded Retrieval-Augmented Optimization for BOM-Aware Procurement and Multi-Echelon Inventory Planning

URL: http://arxiv.org/abs/2601.06164v1
Date: Wed, 07 Jan 2026 01:38:05 GMT
Title: Contract2Plan: Verified Contract-Grounded Retrieval-Augmented Optimization for BOM-Aware Procurement and Multi-Echelon Inventory Planning
Authors: Sahil Agarwal,
Abstract summary: We introduce Contract2Plan, a verified GenAI-to-optimizer pipeline that inserts a solver-based compliance gate before plans are emitted.<n>The system retrieves clause evidence with provenance, extracts a typed constraint schema with evidence spans, compiles constraints into a BOM-aware MILP, and verifies grounding, eligibility, consistency, and feasibility.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Procurement and inventory planning is governed not only by demand forecasts and bills of materials (BOMs), but also by operational terms in contracts and supplier documents (e.g., MOQs, lead times, price tiers, allocation caps, substitution approvals). LLM-based extraction can speed up structuring these terms, but extraction-only or LLM-only decision pipelines are brittle: missed clauses, unit errors, and unresolved conflicts can yield infeasible plans or silent contract violations, amplified by BOM coupling. We introduce Contract2Plan, a verified GenAI-to-optimizer pipeline that inserts a solver-based compliance gate before plans are emitted. The system retrieves clause evidence with provenance, extracts a typed constraint schema with evidence spans, compiles constraints into a BOM-aware MILP, and verifies grounding, eligibility, consistency, and feasibility using solver diagnostics, triggering targeted repair or abstention when automation is unsafe. We formalize which clause classes admit conservative repair with contract-safe feasibility guarantees and which require human confirmation. A self-contained synthetic micro-benchmark (500 instances; T=5) computed by exact enumeration under an execution model with MOQ uplift and emergency purchases shows heavy-tailed regret and nontrivial MOQ-violation incidence for extraction-only planning, motivating verification as a first-class component of contract-grounded planning systems.

Related papers

Incentive Aware AI Regulations: A Credal Characterisation [14.228416693145649]
High-stakes ML applications demand strict regulations, but strategic ML providers often evade them to lower development costs.<n>We introduce regulation mechanisms: a framework that maps empirical evidence from models to a license for some market share.<n>We prove that a mechanism has perfect market outcome if and only if the set of non-compliant distributions forms a credal set of probability measures.
arXiv Detail & Related papers (2026-03-05T13:42:19Z)
Teaching LLMs to Ask: Self-Querying Category-Theoretic Planning for Under-Specified Reasoning [1.8055130471307603]
Inference-time planning with large language models frequently breaks under partial observability.<n>We introduce textbfSelf-Querying Bidirectional Categorical Planning (SQ-BCP), which explicitly represents precondition status.<n>We prove that when the verifier succeeds and hard constraints pass deterministic checks, accepted plans are compatible with goal requirements.
arXiv Detail & Related papers (2026-01-27T19:41:10Z)
CreditAudit: 2$^\ ext{nd}$ Dimension for LLM Evaluation and Selection [44.251742023911135]
CreditAudit is a deployment oriented credit audit framework that evaluates models under a family of semantically aligned and non adversarial system prompt templates.<n>We show that models with similar mean ability can exhibit substantially different fluctuation, and stability risk can overturn prioritization decisions in agentic or high failure cost regimes.
arXiv Detail & Related papers (2026-01-23T07:53:25Z)
ORCHID: Orchestrated Retrieval-Augmented Classification with Human-in-the-Loop Intelligent Decision-Making for High-Risk Property [6.643427585499247]
ORCHID is a modular agentic system for HRP classification.<n>It pairs retrieval-augmented generation (RAG) with human oversight to produce policy-based outputs that can be audited.<n>The demonstration shows single item submission, grounded citations, SME feedback capture, and exportable audit artifacts.
arXiv Detail & Related papers (2025-11-07T03:48:05Z)
Safe and Compliant Cross-Market Trade Execution via Constrained RL and Zero-Knowledge Audits [0.5586191108738564]
We present a cross-market algorithmic trading system that balances execution quality with rigorous compliance enforcement.<n>The architecture comprises a high-level planner, a reinforcement learning execution agent, and an independent compliance agent.<n>We report effects at the 95% confidence level using paired t-tests and examine tail risk via CVaR.
arXiv Detail & Related papers (2025-10-06T15:52:12Z)
Automatic Building Code Review: A Case Study [6.530899637501737]
Building officials face labor-intensive, error-prone, and costly manual reviews of design documents as projects increase in size and complexity.<n>This study introduces a novel agent-driven framework that integrates BIM-based data extraction with automated verification.
arXiv Detail & Related papers (2025-10-03T00:30:14Z)
Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z)
COIN: Uncertainty-Guarding Selective Question Answering for Foundation Models with Provable Risk Guarantees [51.5976496056012]
COIN is an uncertainty-guarding selection framework that calibrates statistically valid thresholds to filter a single generated answer per question.<n>COIN estimates the empirical error rate on a calibration set and applies confidence interval methods to establish a high-probability upper bound on the true error rate.<n>We demonstrate COIN's robustness in risk control, strong test-time power in retaining admissible answers, and predictive efficiency under limited calibration data.
arXiv Detail & Related papers (2025-06-25T07:04:49Z)
Supervised Optimism Correction: Be Confident When LLMs Are Sure [91.7459076316849]
We establish a novel theoretical connection between supervised fine-tuning and offline reinforcement learning.<n>We show that the widely used beam search method suffers from unacceptable over-optimism.<n>We propose Supervised Optimism Correction, which introduces a simple yet effective auxiliary loss for token-level $Q$-value estimations.
arXiv Detail & Related papers (2025-04-10T07:50:03Z)
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs [71.7892165868749]
Commercial Large Language Model (LLM) APIs create a fundamental trust problem.<n>Users pay for specific models but have no guarantee that providers deliver them faithfully.<n>We formalize this model substitution problem and evaluate detection methods under realistic adversarial conditions.<n>We propose and evaluate the use of Trusted Execution Environments (TEEs) as one practical and robust solution.
arXiv Detail & Related papers (2025-04-07T03:57:41Z)
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification [76.14641982122696]
We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control. We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
arXiv Detail & Related papers (2024-10-07T23:38:58Z)
Language Models with Conformal Factuality Guarantees [44.767328168194815]
Conformal factuality is a framework that can ensure high probability correctness guarantees for language model (LM) outputs. We show that conformal prediction in language models corresponds to a back-off algorithm that provides high probability correctness guarantees.
arXiv Detail & Related papers (2024-02-15T18:31:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.