Related papers: The Need for Standardized Evidence Sampling in CMMC Assessments: A Survey-Based Analysis of Assessor Practices

The Need for Standardized Evidence Sampling in CMMC Assessments: A Survey-Based Analysis of Assessor Practices

URL: http://arxiv.org/abs/2602.09905v1
Date: Tue, 10 Feb 2026 15:40:44 GMT
Title: The Need for Standardized Evidence Sampling in CMMC Assessments: A Survey-Based Analysis of Assessor Practices
Authors: Logan Therrien, John Hastings,
Abstract summary: This study investigates whether inconsistencies in evidence sampling practices exist within the Cybersecurity Maturity Model Certification ecosystem.<n>Results indicate that evidence sampling practices are driven by assessor judgment, perceived risk, and environmental complexity rather than formalized standards.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Cybersecurity Maturity Model Certification (CMMC) framework provides a common standard for protecting sensitive unclassified information in defense contracting. While CMMC defines assessment objectives and control requirements, limited formal guidance exists regarding evidence sampling, the process by which assessors select, review, and validate artifacts to substantiate compliance. Analyzing data collected through an anonymous survey of CMMC-certified assessors and lead assessors, this exploratory study investigates whether inconsistencies in evidence sampling practices exist within the CMMC assessment ecosystem and evaluates the need for a risk-informed standardized sampling methodology. Across 17 usable survey responses, results indicate that evidence sampling practices are predominantly driven by assessor judgment, perceived risk, and environmental complexity rather than formalized standards, with formal statistical sampling models rarely referenced. Participants frequently reported inconsistencies across assessments and expressed broad support for the development of standardized guidance, while generally opposing rigid percentage-based requirements. The findings support the conclusion that the absence of a uniform evidence sampling framework introduces variability that may affect assessment reliability and confidence in certification outcomes. Recommendations are provided to inform future CMMC assessment methodology development and further empirical research.

Related papers

DREAM: Deep Research Evaluation with Agentic Metrics [21.555357444628044]
We propose DREAM (Deep Research Evaluation with Agentic Metrics), a framework that makes evaluation itself agentic.<n> DREAM structures assessment through an evaluation protocol combining query-agnostic metrics with adaptive metrics generated by a tool-calling agent.<n>Controlled evaluations demonstrate DREAM is significantly more sensitive to factual and temporal decay than existing benchmarks.
arXiv Detail & Related papers (2026-02-21T19:14:31Z)
Evaluating Medical LLMs by Levels of Autonomy: A Survey Moving from Benchmarks to Applications [14.979261906851036]
This survey reframes evaluation through a levels-of-autonomy lens (L0-L3)<n>We align existing benchmarks and metrics with the actions permitted at each level and their associated risks, making the evaluation targets explicit.
arXiv Detail & Related papers (2025-10-20T17:22:32Z)
Towards Real-Time Fake News Detection under Evidence Scarcity [66.58597356379907]
We propose Evaluation-Aware Selection of Experts (EASE), a novel framework for real-time fake news detection.<n>EASE adapts its decision-making process according to the assessed sufficiency of available evidence.<n>We introduce RealTimeNews-25, a new benchmark for evaluating model generalization on emerging news with limited evidence.
arXiv Detail & Related papers (2025-10-13T11:11:46Z)
FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning [62.452350134196934]
FaithCoT-Bench is a unified benchmark for instance-level CoT unfaithfulness detection.<n>Our framework formulates unfaithfulness detection as a discriminative decision problem.<n>FaithCoT-Bench sets a solid basis for future research toward more interpretable and trustworthy reasoning in LLMs.
arXiv Detail & Related papers (2025-10-05T05:16:54Z)
RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration [81.38705556267917]
Existing safety evaluation methods for large language models (LLMs) suffer from inherent limitations.<n>We introduce a theoretical framework that reconstructs the underlying risk concept space.<n>We propose RADAR, a multi-agent collaborative evaluation framework.
arXiv Detail & Related papers (2025-09-28T09:35:32Z)
CCE: Confidence-Consistency Evaluation for Time Series Anomaly Detection [56.302586730134806]
We introduce Confidence-Consistency Evaluation (CCE), a novel evaluation metric.<n>CCE simultaneously measures prediction confidence and uncertainty consistency.<n>We also establish RankEval, a benchmark for comparing the ranking capabilities of various metrics.
arXiv Detail & Related papers (2025-09-01T03:38:38Z)
Beyond the Leaderboard: Rethinking Medical Benchmarks for Large Language Models [46.81512544528928]
We introduce MedCheck, the first lifecycle-oriented assessment framework specifically designed for medical benchmarks.<n>Our framework deconstructs a benchmark's development into five continuous stages, from design to governance, and provides a comprehensive checklist of 46 medically-tailored criteria.<n>Our analysis uncovers widespread, systemic issues, including a profound disconnect from clinical practice, a crisis of data integrity due to unmitigated contamination risks, and a systematic neglect of safety-critical evaluation dimensions like model robustness and uncertainty awareness.
arXiv Detail & Related papers (2025-08-06T11:11:40Z)
Doing Audits Right? The Role of Sampling and Legal Content Analysis in Systemic Risk Assessments and Independent Audits in the Digital Services Act [0.0]
The European Union's Digital Services Act (DSA) requires online platforms to undergo internal and external audits.<n>This article evaluates the strengths and limitations of different qualitative and quantitative methods for auditing systemic risks.<n>We argue that content sampling, combined with legal and empirical analysis, offers a viable method for risk-specific audits.
arXiv Detail & Related papers (2025-05-06T15:02:54Z)
Evaluating Step-by-step Reasoning Traces: A Survey [8.279021694489462]
Step-by-step reasoning is widely used to enhance the reasoning ability of large language models (LLMs) in complex problems.<n>Existing evaluation practices are highly inconsistent, resulting in fragmented progress across evaluator design and benchmark development.<n>This survey proposes a taxonomy of evaluation criteria with four top-level categories (factuality, validity, coherence, and utility)
arXiv Detail & Related papers (2025-02-17T19:58:31Z)
The simulation of judgment in LLMs [32.57692724251287]
Large Language Models (LLMs) are increasingly embedded in evaluative processes, from information filtering to assessing and addressing knowledge gaps through explanation and credibility judgments.<n>This raises the need to examine how such evaluations are built, what assumptions they rely on, and how their strategies diverge from those of humans.<n>We benchmark six LLMs against expert ratings--NewsGuard and Media Bias/Fact Check--and against human judgments collected through a controlled experiment.
arXiv Detail & Related papers (2025-02-06T18:52:10Z)
The Lessons of Developing Process Reward Models in Mathematical Reasoning [62.165534879284735]
Process Reward Models (PRMs) aim to identify and mitigate intermediate errors in the reasoning processes.<n>We develop a consensus filtering mechanism that effectively integrates Monte Carlo (MC) estimation with Large Language Models (LLMs)<n>We release a new state-of-the-art PRM that outperforms existing open-source alternatives.
arXiv Detail & Related papers (2025-01-13T13:10:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.