Related papers: An Integrated Platform for LEED Certification Automation Using Computer Vision and LLM-RAG

Related papers

IDP Accelerator: Agentic Document Intelligence from Extraction to Compliance Validation [3.539467892338473]
We present IDP (Intelligent Document Processing) Accelerator, a framework enabling agentic AI for end-to-end document intelligence.<n>The interactive demonstration enables users to upload document packets, visualize classification results, and explore extracted data.
arXiv Detail & Related papers (2026-02-26T20:20:38Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
Context-Aware Visual Prompting: Automating Geospatial Web Dashboards with Large Language Models and Agent Self-Validation for Decision Support [1.506501956463029]
Development of web-based dashboards for risk analysis and decision making often challenged by difficulty in big, multidimensional data.<n>We introduce a generative AI framework that automates the creation of interactive geospatial dashboards from user-defined inputs.
arXiv Detail & Related papers (2025-10-10T10:58:15Z)
Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation [31.356673356827432]
We present a layout-aware and efficiency-optimized framework for automated extraction and evaluation.<n>Our system is fully deployed in Alibaba's intelligent HR platform, supporting real-time applications across its business units.
arXiv Detail & Related papers (2025-10-10T07:01:35Z)
VeriOpt: PPA-Aware High-Quality Verilog Generation via Multi-Role LLMs [41.94295877935867]
VeriOpt is a novel framework that leverages role-based prompting and PPA-aware optimization to produce high-quality, synthesizable Verilog.<n>Our work advances the state-of-the-art AI-driven hardware design by addressing the critical gap between correctness and quality.
arXiv Detail & Related papers (2025-07-20T00:28:55Z)
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models [76.72220653705679]
We introduce MCPEval, an open-source framework that automates end-to-end task generation and deep evaluation of intelligent agents.<n> MCPEval standardizes metrics, seamlessly integrates with native agent tools, and eliminates manual effort in building evaluation pipelines.<n> Empirical results across five real-world domains show its effectiveness in revealing nuanced, domain-specific performance.
arXiv Detail & Related papers (2025-07-17T05:46:27Z)
Leveraging Machine Learning and Enhanced Parallelism Detection for BPMN Model Generation from Text [75.77648333476776]
This paper introduces an automated pipeline for extracting BPMN models from text.<n>A key contribution of this work is the introduction of a newly annotated dataset.<n>We augment the dataset with 15 newly annotated documents containing 32 parallel gateways for model training.
arXiv Detail & Related papers (2025-07-11T07:25:55Z)
ArtifactsBench: Bridging the Visual-Interactive Gap in LLM Code Generation Evaluation [51.297873393639456]
ArtifactsBench is a framework for automated visual code generation evaluation.<n>Our framework renders each generated artifact and captures its dynamic behavior through temporal screenshots.<n>We construct a new benchmark of 1,825 diverse tasks and evaluate over 30 leading Large Language Models.
arXiv Detail & Related papers (2025-07-07T12:53:00Z)
Provenance Tracking in Large-Scale Machine Learning Systems [0.0]
y4ML is a tool designed to collect data in a format compliant with the W3C PROV and ProvProvML standards.<n>y4ML is fully integrated with the yProv framework, allowing for higher level pairing in tasks run also through workflow management systems.
arXiv Detail & Related papers (2025-07-01T14:10:02Z)
Comprehensive Verilog Design Problems: A Next-Generation Benchmark Dataset for Evaluating Large Language Models and Agents on RTL Design and Verification [6.0652877909448835]
We present the Comprehensive Verilog (CVDP) benchmark, a new dataset and infrastructure to advance research in hardware and verification.<n>CVDP includes 783 problems across task categories, covering verification, debug, generation, alignment, and technical Q&A.<n>Problemes are offered in both non-agent and agentic formats.
arXiv Detail & Related papers (2025-06-17T00:11:13Z)
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z)
LEMUR Neural Network Dataset: Towards Seamless AutoML [34.04248949660201]
We introduce LEMUR, an open source dataset of neural network models with well-structured code for diverse architectures.<n>LEMUR is primarily designed to enable fine-tuning of large language models for automated machine learning tasks.<n>LEMUR will be released as an open source project under the MIT license upon acceptance of the paper.
arXiv Detail & Related papers (2025-04-14T09:08:00Z)
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models [58.45517851437422]
Visually-situated text parsing (VsTP) has recently seen notable advancements, driven by the growing demand for automated document understanding.<n>Existing solutions often rely on task-specific architectures and objectives for individual tasks.<n>In this paper, we introduce Omni V2, a universal model that unifies VsTP typical tasks, including text spotting, key information extraction, table recognition, and layout analysis.
arXiv Detail & Related papers (2025-02-22T09:32:01Z)
LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models [0.0]
LatteReview is a Python-based framework that leverages large language models (LLMs) and multi-agent systems to automate key elements of the systematic review process.<n>The framework supports features such as Retrieval-Augmented Generation (RAG) for incorporating external context, multimodal reviews, Pydantic-based validation for structured inputs and outputs, and asynchronous programming for handling large-scale datasets.
arXiv Detail & Related papers (2025-01-05T17:53:00Z)
UltraEval: A Lightweight Platform for Flexible and Comprehensive Evaluation for LLMs [74.1976921342982]
This paper introduces UltraEval, a user-friendly evaluation framework characterized by its lightweight nature, comprehensiveness, modularity, and efficiency. The resulting composability allows for the free combination of different models, tasks, prompts, benchmarks, and metrics within a unified evaluation workflow.
arXiv Detail & Related papers (2024-04-11T09:17:12Z)
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks [37.48197934228379]
There is no AutoML system that automates the entire end-to-end model production workflow for computer vision.<n>We propose a novel request-to-model task, which involves understanding the user's natural language request and executing the entire workflow to output production-ready models.<n>This empowers non-expert individuals to easily build task-specific models via a user-friendly language interface.
arXiv Detail & Related papers (2024-02-23T14:38:19Z)
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation [30.693616802332745]
This paper presents a novel benchmark, AssistGUI, to evaluate whether models are capable of manipulating the mouse and keyboard on the Windows platform in response to user-requested tasks. We propose an advanced Actor-Critic framework, which incorporates a sophisticated GUI driven by an AI agent and adept at handling lengthy procedural tasks.
arXiv Detail & Related papers (2023-12-20T15:28:38Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.