Related papers: Bridging Psychometric and Content Development Practices with AI: A Community-Based Workflow for Augmenting Hawaiian Language Assessments

Bridging Psychometric and Content Development Practices with AI: A Community-Based Workflow for Augmenting Hawaiian Language Assessments

URL: http://arxiv.org/abs/2512.17140v1
Date: Fri, 19 Dec 2025 00:21:48 GMT
Title: Bridging Psychometric and Content Development Practices with AI: A Community-Based Workflow for Augmenting Hawaiian Language Assessments
Authors: Pōhai Kūkea-Shultz, Frank Brockmann,
Abstract summary: The paper presents the design and evaluation of a community-based artificial intelligence (AI) workflow for the Kaiapuni Assessment of Educational Outcomes (K'EO) program.<n>K'EO is the only native language assessment used for federal accountability in the United States.<n>The project explored whether document-grounded language models could ethically and effectively augment human analysis of item performance.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper presents the design and evaluation of a community-based artificial intelligence (AI) workflow developed for the Kaiapuni Assessment of Educational Outcomes (KĀ'EO) program, the only native language assessment used for federal accountability in the United States. The project explored whether document-grounded language models could ethically and effectively augment human analysis of item performance while preserving the cultural and linguistic integrity of the Hawaiian language. Operating under the KĀ'EO AI Policy Framework, the workflow used NotebookLM for cross-document synthesis of psychometric data and Claude 3.5 Sonnet for developer-facing interpretation, with human oversight at every stage. Fifty-eight flagged items across Hawaiian Language Arts, Mathematics, and Science were reviewed during Round 2 of the AI Lab, producing six interpretive briefs that identified systemic design issues such as linguistic ambiguity, Depth-of-Knowledge (DOK) misalignment, and structural overload. The findings demonstrate that AI can serve as an ethically bounded amplifier of human expertise, accelerating analysis while simultaneously prioritizing fairness, human expertise, and cultural authority. This work offers a replicable model for responsible AI integration in Indigenous-language educational measurement.

Related papers

Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Stud [2.659655189346942]
This paper presents a methodology for evaluating AIETs in language models.<n>We selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling.<n>The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model.
arXiv Detail & Related papers (2025-12-16T02:43:37Z)
AIssistant: An Agentic Approach for Human--AI Collaborative Scientific Work on Reviews and Perspectives in Machine Learning [2.464267718050055]
We present here the first experiments with AIssistant for perspective and review research papers in machine learning.<n>Our system integrates modular tools and agents for literature, section-wise experimentation, citation management, and automatic paper text generation.<n>Despite its effectiveness, we identify key limitations, including hallucinated citations, difficulty adapting to dynamic paper structures, and incomplete integration of multimodal content.
arXiv Detail & Related papers (2025-09-14T15:50:31Z)
CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
The next question after Turing's question: Introducing the Grow-AI test [51.56484100374058]
This study aims to extend the framework for assessing artificial intelligence, called GROW-AI.<n>GROW-AI is designed to answer the question "Can machines grow up?" -- a natural successor to the Turing Test.<n>The originality of the work lies in the conceptual transposition of the process of "growing" from the human world to that of artificial intelligence.
arXiv Detail & Related papers (2025-08-22T10:19:42Z)
The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
Enhancing AI-Driven Education: Integrating Cognitive Frameworks, Linguistic Feedback Analysis, and Ethical Considerations for Improved Content Generation [0.0]
This paper synthesizes insights from four related studies to propose a comprehensive framework for enhancing AI-driven educational tools.<n>We integrate cognitive assessment frameworks, linguistic analysis of AI-generated feedback, and ethical design principles to guide the development of effective and responsible AI tools.
arXiv Detail & Related papers (2025-05-01T06:36:21Z)
Survey on Vision-Language-Action Models [0.2636873872510828]
This work does not represent original research, but highlights how AI can help automate literature reviews.<n>Future research will focus on developing a structured framework for AI-assisted literature reviews.
arXiv Detail & Related papers (2025-02-07T11:56:46Z)
Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order 14110 [44.99833362998488]
Policy documents, such as legislation, regulations, and executive orders, are crucial in shaping society. This study aims to evaluate the potential of AI in streamlining policy analysis and to identify the strengths and limitations of current AI approaches.
arXiv Detail & Related papers (2024-06-10T11:19:28Z)
Explainable Authorship Identification in Cultural Heritage Applications: Analysis of a New Perspective [48.031678295495574]
We explore the applicability of existing general-purpose eXplainable Artificial Intelligence (XAI) techniques to AId. In particular, we assess the relative merits of three different types of XAI techniques on three different AId tasks. Our analysis shows that, while these techniques make important first steps towards explainable Authorship Identification, more work remains to be done.
arXiv Detail & Related papers (2023-11-03T20:51:15Z)
Supporting Human-AI Collaboration in Auditing LLMs with LLMs [33.56822240549913]
Large language models have been shown to be biased and behave irresponsibly. It is crucial to audit these language models rigorously. Existing auditing tools leverage either or both humans and AI to find failures.
arXiv Detail & Related papers (2023-04-19T21:59:04Z)
How To Evaluate Your Dialogue System: Probe Tasks as an Alternative for Token-level Evaluation Metrics [47.20761880464552]
generative dialogue modeling is widely seen as a language modeling task. The task demands an agent to have a complex natural language understanding of its input text to carry a meaningful interaction with an user. The automatic metrics used evaluate the quality of the generated text as a proxy to the holistic interaction of the agent.
arXiv Detail & Related papers (2020-08-24T13:28:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.