Related papers: Towards Improved Research Methodologies for Industrial AI: A case study of false call reduction

Related papers

The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research [56.80927148740585]
We address the challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators.<n>We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent.<n>Our work demonstrates the potential of AI agents to transform research evaluation and pave the way for rigorous scientific practices.
arXiv Detail & Related papers (2026-02-05T19:00:02Z)
Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z)
The Role of AI in Modern Penetration Testing [0.0]
Penetration testing is a cornerstone of cybersecurity, traditionally driven by manual, time-intensive processes.<n>This systematic literature review examines how Artificial Intelligence (AI) is reshaping penetration testing.
arXiv Detail & Related papers (2025-12-13T13:34:31Z)
SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z)
Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z)
QUINTA: Reflexive Sensibility For Responsible AI Research and Data-Driven Processes [2.504366738288215]
This paper presents a comprehensive framework grounded in critical reflexivity as intersectional praxis.<n>The framework centers researcher reflexivity to call attention to the AI researchers' power in creating and analyzing AI/DS artifacts through data-centric approaches.
arXiv Detail & Related papers (2025-09-19T18:40:30Z)
The More You Automate, the Less You See: Hidden Pitfalls of AI Scientist Systems [11.543423308064275]
AI scientist systems are capable of executing the full research workflow from hypothesis generation to paper writing.<n>This lack of scrutiny poses a risk of introducing flaws that could undermine the integrity, reliability, and trustworthiness of their research outputs.<n>We identify four potential failure modes in contemporary AI scientist systems: inappropriate benchmark selection, data leakage, metric misuse, and post-hoc selection bias.
arXiv Detail & Related papers (2025-09-10T16:04:24Z)
AI, Humans, and Data Science: Optimizing Roles Across Workflows and the Workforce [0.0]
We consider the potential and limitation of analytic, generative, and agentic AI to augment data scientists or take on tasks traditionally done by human analysts and researchers.<n>Just as earlier eras of survey analysis created issues when the increased ease of using statistical software allowed researchers to conduct analyses they did not fully understand, the new AI tools may create similar but larger risks.
arXiv Detail & Related papers (2025-07-15T17:59:06Z)
AI-Driven Tools in Modern Software Quality Assurance: An Assessment of Benefits, Challenges, and Future Directions [0.0]
The research aims to assess the benefits, challenges, and prospects of integrating modern AI-oriented tools into quality assurance processes.<n>The research demonstrates AI's transformative potential for QA but highlights the importance of a strategic approach to implementing these technologies.
arXiv Detail & Related papers (2025-06-19T20:22:47Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Expectations vs Reality -- A Secondary Study on AI Adoption in Software Testing [0.7812210699650152]
In the software industry, artificial intelligence (AI) has been utilized more and more in software development activities.<n>In this paper, the objective was to identify what kind of empirical research with industry context has been conducted on AI in software testing.
arXiv Detail & Related papers (2025-04-07T11:03:54Z)
Bridging the Gap: Integrating Ethics and Environmental Sustainability in AI Research and Practice [57.94036023167952]
We argue that the efforts aiming to study AI's ethical ramifications should be made in tandem with those evaluating its impacts on the environment.<n>We propose best practices to better integrate AI ethics and sustainability in AI research and practice.
arXiv Detail & Related papers (2025-04-01T13:53:11Z)
Beyond principlism: Practical strategies for ethical AI use in research practices [0.0]
The rapid adoption of generative artificial intelligence in scientific research has outpaced the development of ethical guidelines. Existing approaches offer little practical guidance for addressing ethical challenges of AI in scientific research practices. I propose a user-centered, realism-inspired approach to bridge the gap between abstract principles and day-to-day research practices.
arXiv Detail & Related papers (2024-01-27T03:53:25Z)
A Discrepancy Aware Framework for Robust Anomaly Detection [51.710249807397695]
We present a Discrepancy Aware Framework (DAF), which demonstrates robust performance consistently with simple and cheap strategies. Our method leverages an appearance-agnostic cue to guide the decoder in identifying defects, thereby alleviating its reliance on synthetic appearance. Under the simple synthesis strategies, it outperforms existing methods by a large margin. Furthermore, it also achieves the state-of-the-art localization performance.
arXiv Detail & Related papers (2023-10-11T15:21:40Z)
GLUECons: A Generic Benchmark for Learning Under Constraints [102.78051169725455]
In this work, we create a benchmark that is a collection of nine tasks in the domains of natural language processing and computer vision. We model external knowledge as constraints, specify the sources of the constraints for each task, and implement various models that use these constraints.
arXiv Detail & Related papers (2023-02-16T16:45:36Z)
In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z)
Metaethical Perspectives on 'Benchmarking' AI Ethics [81.65697003067841]
Benchmarks are seen as the cornerstone for measuring technical progress in Artificial Intelligence (AI) research. An increasingly prominent research area in AI is ethics, which currently has no set of benchmarks nor commonly accepted way for measuring the 'ethicality' of an AI system. We argue that it makes more sense to talk about 'values' rather than 'ethics' when considering the possible actions of present and future AI systems.
arXiv Detail & Related papers (2022-04-11T14:36:39Z)
Implementing AI Ethics in Practice: An Empirical Evaluation of the RESOLVEDD Strategy [6.7298812735467095]
We empirically evaluate an existing method from the field of business ethics, the RESOLVEDD strategy, in the context of ethical system development. One of our key findings is that, even though the use of the ethical method was forced upon the participants, its utilization nonetheless facilitated of ethical consideration in the projects.
arXiv Detail & Related papers (2020-04-21T17:58:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.