Related papers: SANGAM: SystemVerilog Assertion Generation via Monte Carlo Tree Self-Refine

Related papers

MC-Search: Evaluating and Enhancing Multimodal Agentic Search with Structured Long Reasoning Chains [79.14584837105808]
We present MC-Search, the first benchmark for agentic MM-RAG with long, step-wise annotated reasoning chains spanning five representative reasoning structures.<n>Beyond answer accuracy, MC-Search introduces new process-level metrics for reasoning quality, stepwise retrieval and planning accuracy.<n>By developing a unified agentic MM-RAG pipeline, we benchmark six leading MLLMs and reveal systematic issues such as over- and under-retrieval and modality-misaligned planning.
arXiv Detail & Related papers (2026-03-01T02:25:57Z)
Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models [96.0074341403456]
Inference-time compute has re-emerged as a practical way to improve LLM reasoning.<n>Most test-time scaling (TTS) algorithms rely on autoregressive decoding.<n>We propose Prism, an efficient TTS framework for dLLMs.
arXiv Detail & Related papers (2026-02-02T09:14:51Z)
Diffusion Language Model Inference with Monte Carlo Tree Search [22.7649405246503]
Diffusion language models (DLMs) have emerged as a compelling alternative to autoregressive generation.<n>We introduce MEDAL, a principled search mechanism for DLMs inference.<n>Across multiple benchmarks, MEDAL achieves up to 22.0% improvement over existing inference strategies.
arXiv Detail & Related papers (2025-12-13T04:30:02Z)
AI for Distributed Systems Design: Scalable Cloud Optimization Through Repeated LLMs Sampling And Simulators [3.1594665317979698]
We explore AI-driven distributed-systems policy design by combining code generation from large language models with deterministic verification in a domain-specific simulator.<n>We report preliminary results on throughput improvements across multiple models.<n>We conjecture that AI will be crucial for scaling this methodology by helping to bootstrap new simulators.
arXiv Detail & Related papers (2025-10-20T16:10:24Z)
A Multi-Strategy Approach for AI-Generated Text Detection [0.5735035463793009]
This paper presents three distinct systems developed for the M-DAIGT shared task on detecting AI generated content in news articles and academic abstracts.<n>The RoBERTa-based system emerged as the most performant, achieving near-perfect results on both development and test sets.
arXiv Detail & Related papers (2025-08-30T22:37:35Z)
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward [50.97588334916863]
We develop CompassVerifier, an accurate and robust lightweight verifier model for evaluation and outcome reward.<n>It demonstrates multi-domain competency spanning math, knowledge, and diverse reasoning tasks, with the capability to process various answer types.<n>We introduce VerifierBench benchmark comprising model outputs collected from multiple data sources, augmented through manual analysis of metaerror patterns to enhance CompassVerifier.
arXiv Detail & Related papers (2025-08-05T17:55:24Z)
SUDER: Self-Improving Unified Large Multimodal Models for Understanding and Generation with Dual Self-Rewards [55.99492656542475]
We propose textbfSUDER (textbfSelf-improving textbfUnified LMMs with textbfDual stextbfElf-textbfRewards), a framework reinforcing the understanding and generation capabilities of LMMs.
arXiv Detail & Related papers (2025-06-09T17:38:45Z)
Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z)
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought [58.321044666612174]
Vad-R1 is an end-to-end MLLM-based framework for Video Anomaly Reasoning.<n>We design a Perception-to-Cognition Chain-of-Thought (P2C-CoT) that simulates the human process of recognizing anomalies.<n>We also propose an improved reinforcement learning algorithm AVA-GRPO, which explicitly incentivizes the anomaly reasoning capability of MLLMs.
arXiv Detail & Related papers (2025-05-26T12:05:16Z)
I-MCTS: Enhancing Agentic AutoML via Introspective Monte Carlo Tree Search [10.718560472954644]
Introspective Monte Carlo Tree Search (I-MCTS) is a novel approach that iteratively expands tree nodes through an introspective process.<n>We integrate a Large Language Model (LLM)-based value model to facilitate direct evaluation of each node's solution.<n>Our approach demonstrates a 6% absolute improvement in performance compared to the strong open-source AutoML agents.
arXiv Detail & Related papers (2025-02-20T16:19:09Z)
Leveraging LLM Agents for Automated Optimization Modeling for SASP Problems: A Graph-RAG based Approach [7.790822602801334]
We propose an automated modeling approach based on retrieval-augmented generation (RAG) technique.<n>The proposed approach (termed as MAG-RAG) outperforms several AOM benchmarks.
arXiv Detail & Related papers (2025-01-30T13:00:15Z)
Training of Scaffolded Language Models with Language Supervision: A Survey [62.59629932720519]
This survey organizes the literature on the design and optimization of emerging structures around post-trained LMs.<n>We refer to this overarching structure as scaffolded LMs and focus on LMs that are integrated into multi-step processes with tools.
arXiv Detail & Related papers (2024-10-21T18:06:25Z)
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation [63.611024451010316]
Following natural instructions is crucial for the effective application of Retrieval-Augmented Generation (RAG) systems. We propose VIF-RAG, the first automated, scalable, and verifiable synthetic pipeline for instruction-following alignment in RAG systems.
arXiv Detail & Related papers (2024-10-12T16:30:51Z)
VSLLaVA: a pipeline of large multimodal foundation model for industrial vibration signal analysis [17.856611893709793]
VSLLaVA is a comprehensive pipeline that utilizes expert knowledge-guided instruction tuning and evaluation to create an end-to-end LMM for signal analysis.<n>This research demonstrates a viable approach for developing specialized foundational models for complex industrial applications.
arXiv Detail & Related papers (2024-09-03T06:21:26Z)
GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification [12.598652038778368]
We propose the GMM-ResNext model for speaker verification. A two-path GMM-ResNext model based on two gender-related GMMs has also been proposed. The proposed GMM-ResNext achieves relative improvements of 48.1% and 11.3% in EER compared with ResNet34 and ECAPA-TDNN on VoxCeleb1-O test set.
arXiv Detail & Related papers (2024-07-03T14:14:18Z)
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs [19.90330712436838]
Modern high-stakes systems, such as healthcare or robotics, often generate vast streaming event sequences. Our goal is to design an efficient, plug-and-play tool to elicit logic tree-based explanations from Large Language Models (LLMs) to provide customized insights into each observed event sequence. In the online setting, our locally built, lightweight model will iteratively extract the most relevant rules from LLMs for each sequence using only a few iterations.
arXiv Detail & Related papers (2024-06-03T09:10:42Z)
ChIRAAG: ChatGPT Informed Rapid and Automated Assertion Generation [10.503097140635374]
ChIRAAG, based on OpenAI GPT4, generates System Verilog Assertion (SVA) from natural language specifications of a design. In experiments, only 27% of LLM-generated raw assertions had errors, which was rectified in few iterations. Our results show that LLMs can streamline and assist engineers in the assertion generation process, reshaping verification.
arXiv Detail & Related papers (2024-01-31T12:41:27Z)
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals [67.64770842323966]
Causal explanations of predictions of NLP systems are essential to ensure safety and establish trust. Existing methods often fall short of explaining model predictions effectively or efficiently. We propose two approaches for counterfactual (CF) approximation.
arXiv Detail & Related papers (2023-10-01T07:31:04Z)
Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates. To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item. We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z)
Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner [56.08919422452905]
We propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR) Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises. We outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.
arXiv Detail & Related papers (2022-05-18T21:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.