R\&D evaluation methodology based on group-AHP with uncertainty
- URL: http://arxiv.org/abs/2108.02595v2
- Date: Mon, 22 Nov 2021 16:31:23 GMT
- Title: R\&D evaluation methodology based on group-AHP with uncertainty
- Authors: Alberto Garinei, Emanuele Piccioni, Massimiliano Proietti, Andrea
Marini, Stefano Speziali, Marcello Marconi, Raffaella Di Sante, Sara
Casaccia, Paolo Castellini, Milena Martarelli, Nicola Paone, Gian Marco
Revel, Lorenzo Scalise, Marco Arnesano, Paolo Chiariotti, Roberto Montanini,
Antonino Quattrocchi, Sergio Silvestri, Giorgio Ficco, Emanuele Rizzuto,
Andrea Scorza, Matteo Lancini, Gianluca Rossi, Roberto Marsili, Emanuele
Zappa, Salvatore Sciuto, Gaetano Vacca, Laura Fabbiano
- Abstract summary: We present an approach to evaluate Research & Development (R&D) performance based on the Analytic Hierarchy Process (AHP) method.
We single out a set of indicators needed for R&D performance evaluation.
The numerical values associated with all the indicators are then used to assign a score to a given R&D project.
- Score: 0.17689918341582753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present an approach to evaluate Research \& Development
(R\&D) performance based on the Analytic Hierarchy Process (AHP) method.
Through a set of questionnaires submitted to a team of experts, we single out a
set of indicators needed for R\&D performance evaluation. The indicators,
together with the corresponding criteria, form the basic hierarchical structure
of the AHP method. The numerical values associated with all the indicators are
then used to assign a score to a given R\&D project. In order to aggregate
consistently the values taken on by the different indicators, we operate on
them so that they are mapped to dimensionless quantities lying in a unit
interval. This is achieved by employing the empirical Cumulative Density
Function (CDF) for each of the indicators. We give a thorough discussion on how
to assign a score to an R\&D project along with the corresponding uncertainty
due to possible inconsistencies of the decision process. A particular example
of R\&D performance is finally considered.
Related papers
- Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets [0.0]
Retrieval-Augmented Generation (RAG) has advanced significantly in recent years.
RAG complexity poses substantial challenges for systematic evaluation and quality enhancement.
This study systematically reviews 63 academic articles to provide a comprehensive overview of state-of-the-art RAG evaluation methodologies.
arXiv Detail & Related papers (2025-04-28T08:22:19Z) - SEOE: A Scalable and Reliable Semantic Evaluation Framework for Open Domain Event Detection [70.23196257213829]
We propose a scalable and reliable Semantic-level Evaluation framework for Open domain Event detection.
Our proposed framework first constructs a scalable evaluation benchmark that currently includes 564 event types covering 7 major domains.
We then leverage large language models (LLMs) as automatic evaluation agents to compute a semantic F1-score, incorporating fine-grained definitions of semantically similar labels.
arXiv Detail & Related papers (2025-03-05T09:37:05Z) - SedarEval: Automated Evaluation using Self-Adaptive Rubrics [4.97150240417381]
We propose a new evaluation paradigm based on self-adaptive rubrics.
SedarEval consists of 1,000 meticulously crafted questions, each with its own self-adaptive rubric.
We train a specialized evaluator language model (evaluator LM) to supplant human graders.
arXiv Detail & Related papers (2025-01-26T16:45:09Z) - Dissecting Out-of-Distribution Detection and Open-Set Recognition: A Critical Analysis of Methods and Benchmarks [17.520137576423593]
We aim to provide a consolidated view of the two largest sub-fields within the community: out-of-distribution (OOD) detection and open-set recognition (OSR)
We perform rigorous cross-evaluation between state-of-the-art methods in the OOD detection and OSR settings and identify a strong correlation between the performances of methods for them.
We propose a new, large-scale benchmark setting which we suggest better disentangles the problem tackled by OOD detection and OSR.
arXiv Detail & Related papers (2024-08-29T17:55:07Z) - Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods [49.62131719441252]
Attribution methods compute importance scores for input features to explain the output predictions of deep models.
In this work, we first identify a set of fidelity criteria that reliable benchmarks for attribution methods are expected to fulfill.
We then introduce a Backdoor-based eXplainable AI benchmark (BackX) that adheres to the desired fidelity criteria.
arXiv Detail & Related papers (2024-05-02T13:48:37Z) - A structured regression approach for evaluating model performance across intersectional subgroups [53.91682617836498]
Disaggregated evaluation is a central task in AI fairness assessment, where the goal is to measure an AI system's performance across different subgroups.
We introduce a structured regression approach to disaggregated evaluation that we demonstrate can yield reliable system performance estimates even for very small subgroups.
arXiv Detail & Related papers (2024-01-26T14:21:45Z) - DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and
Improvement of Large Language Models [4.953092503184905]
This work proposes DCR, an automated framework for evaluating and improving the consistency of Large Language Models (LLMs) generated texts.
We introduce an automatic metric converter (AMC) that translates the output from DCE into an interpretable numeric score.
Our approach also substantially reduces nearly 90% of output inconsistencies, showing promise for effective hallucination mitigation.
arXiv Detail & Related papers (2024-01-04T08:34:16Z) - A Framework for Auditing Multilevel Models using Explainability Methods [2.578242050187029]
An audit framework for technical assessment of regressions is proposed.
The focus is on three aspects, model, discrimination, and transparency and explainability.
It is demonstrated that popular explainability methods, such as SHAP and LIME, underperform in accuracy when interpreting these models.
arXiv Detail & Related papers (2022-07-04T17:53:21Z) - Multiple-criteria Heuristic Rating Estimation [0.0]
Heuristic Rating Estimation (HRE) method proposed in 2014 tried to bring answer to this question.
We analyze how HRE can be used as part of the Analytic Hierarchy Process hierarchical framework.
arXiv Detail & Related papers (2022-05-20T20:12:04Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Towards Question-Answering as an Automatic Metric for Evaluating the
Content Quality of a Summary [65.37544133256499]
We propose a metric to evaluate the content quality of a summary using question-answering (QA)
We demonstrate the experimental benefits of QA-based metrics through an analysis of our proposed metric, QAEval.
arXiv Detail & Related papers (2020-10-01T15:33:09Z) - Uncertainty-aware Score Distribution Learning for Action Quality
Assessment [91.05846506274881]
We propose an uncertainty-aware score distribution learning (USDL) approach for action quality assessment (AQA)
Specifically, we regard an action as an instance associated with a score distribution, which describes the probability of different evaluated scores.
Under the circumstance where fine-grained score labels are available, we devise a multi-path uncertainty-aware score distributions learning (MUSDL) method to explore the disentangled components of a score.
arXiv Detail & Related papers (2020-06-13T15:41:29Z) - On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link
Prediction Methods [27.27230441498167]
We take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment.
In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets.
We show that this leads to various problems in the interpretation of results, which may support misleading conclusions.
arXiv Detail & Related papers (2020-02-17T12:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.