Related papers: What is Reproducibility in Artificial Intelligence and Machine Learning Research?

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning [53.05161493434908]
Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes.<n>We introduce Veri-R1, an online reinforcement learning framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors.<n> Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts.
arXiv Detail & Related papers (2025-10-02T11:49:48Z)
SCI-Verifier: Scientific Verifier with Thinking [37.08904000514563]
Large language models (LLMs) are increasingly applied to scientific reasoning.<n>Existing verification studies in scientific domains suffer from two major limitations.<n>We propose solutions at both the data and model levels.
arXiv Detail & Related papers (2025-09-29T04:58:43Z)
Accountability of Robust and Reliable AI-Enabled Systems: A Preliminary Study and Roadmap [1.8816378259778017]
This vision paper presents initial research on assessing the robustness and reliability of AI-enabled systems.<n>By exploring evolving definitions of these concepts and reviewing current literature, the study highlights major challenges and approaches in the field.<n>The incorporation of accountability is crucial for building trust and ensuring responsible AI development.
arXiv Detail & Related papers (2025-06-20T08:35:11Z)
From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models [13.343562681680426]
In pursuit of artificial general intelligence (AGI), there is a growing need for models that learn, reason, and generate new knowledge.<n>This survey offers a structured lens to examine Large Language Models-based hypothesis discovery.<n>We synthesize existing work in hypothesis generation, application, and validation, identifying both key achievements and critical gaps.
arXiv Detail & Related papers (2025-05-28T03:40:02Z)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards [67.86091419220816]
Large Language Models (LLMs) show great promise in complex reasoning.<n>A prevalent issue is superficial self-reflection'', where models fail to robustly verify their own outputs.<n>We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.
arXiv Detail & Related papers (2025-05-19T17:59:31Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration. These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance. This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms.
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
An Overview of Large Language Models for Statisticians [109.38601458831545]
Large Language Models (LLMs) have emerged as transformative tools in artificial intelligence (AI) This paper explores potential areas where statisticians can make important contributions to the development of LLMs. We focus on issues such as uncertainty quantification, interpretability, fairness, privacy, watermarking and model adaptation.
arXiv Detail & Related papers (2025-02-25T03:40:36Z)
ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification [53.80183105328448]
Refine via Intrinsic Self-Verification (ReVISE) is an efficient framework that enables LLMs to self-correct their outputs through self-verification. Our experiments on various reasoning tasks demonstrate that ReVISE achieves efficient self-correction and significantly improves reasoning performance.
arXiv Detail & Related papers (2025-02-20T13:50:02Z)
Autotelic Reinforcement Learning: Exploring Intrinsic Motivations for Skill Acquisition in Open-Ended Environments [1.104960878651584]
This paper presents a comprehensive overview of autotelic Reinforcement Learning (RL), emphasizing the role of intrinsic motivations in the open-ended formation of skill repertoires. We delineate the distinctions between knowledge-based and competence-based intrinsic motivations, illustrating how these concepts inform the development of autonomous agents capable of generating and pursuing self-defined goals.
arXiv Detail & Related papers (2025-02-06T14:37:46Z)
Generative AI in Health Economics and Outcomes Research: A Taxonomy of Key Definitions and Emerging Applications, an ISPOR Working Group Report [12.204470166456561]
Generative AI shows significant potential in health economics and outcomes research (HEOR) Generative AI shows significant potential in HEOR, enhancing efficiency, productivity, and offering novel solutions to complex challenges. Foundation models are promising in automating complex tasks, though challenges remain in scientific reliability, bias, interpretability, and workflow integration.
arXiv Detail & Related papers (2024-10-26T15:42:50Z)
Reproducibility study of "LICO: Explainable Models with Language-Image Consistency" [0.5825410941577593]
This paper investigates the claims made by Lei et al. (2023) regarding their proposed method, LICO, for enhancing post-hoc interpretability techniques. We do not find that LICO consistently led to improved classification performance or improvements in quantitative and qualitative measures of interpretability.
arXiv Detail & Related papers (2024-10-17T19:41:34Z)
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure. Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z)
GIVE: Structured Reasoning with Knowledge Graph Inspired Veracity Extrapolation [108.2008975785364]
Graph Inspired Veracity Extrapolation (GIVE) is a novel reasoning framework that integrates the parametric and non-parametric memories. Our method facilitates a more logical and step-wise reasoning approach akin to experts' problem-solving, rather than gold answer retrieval.
arXiv Detail & Related papers (2024-10-11T03:05:06Z)
The Role of Deductive and Inductive Reasoning in Large Language Models [35.43513487137371]
Large Language Models (LLMs) have achieved substantial progress in artificial intelligence, particularly in reasoning tasks. We propose the Deductive and InDuctive(DID) method, which enhances LLM reasoning by dynamically integrating both deductive and inductive reasoning. Our findings suggest that DID provides a more robust and cognitively aligned framework for reasoning in LLMs.
arXiv Detail & Related papers (2024-10-03T18:30:47Z)
Cooperative Resilience in Artificial Intelligence Multiagent Systems [2.0608564715600273]
This paper proposes a clear definition of cooperative resilience' and a methodology for its quantitative measurement. The results highlight the crucial role of resilience metrics in analyzing how the collective system prepares for, resists, recovers from, sustains well-being, and transforms in the face of disruptions.
arXiv Detail & Related papers (2024-09-20T03:28:48Z)
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning [54.69189620971405]
We provide a unified framework, termed Identifiable Exchangeable Mechanisms (IEM), for representation and structure learning. IEM provides new insights that let us relax the necessary conditions for causal structure identification in exchangeable non-i.i.d. data. We also demonstrate the existence of a duality condition in identifiable representation learning, leading to new identifiability results.
arXiv Detail & Related papers (2024-06-20T13:30:25Z)
Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$. Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs. Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z)
From Model Performance to Claim: How a Change of Focus in Machine Learning Replicability Can Help Bridge the Responsibility Gap [0.0]
Two goals - improving replicability and accountability of Machine Learning research. This paper posits that reconceptualizing replicability can help bridge the gap.
arXiv Detail & Related papers (2024-04-19T18:36:14Z)
A Comprehensive Survey of Continual Learning: Theory, Method and Application [64.23253420555989]
We present a comprehensive survey of continual learning, seeking to bridge the basic settings, theoretical foundations, representative methods, and practical applications. We summarize the general objectives of continual learning as ensuring a proper stability-plasticity trade-off and an adequate intra/inter-task generalizability in the context of resource efficiency.
arXiv Detail & Related papers (2023-01-31T11:34:56Z)
Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world. Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts. This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.