An NLP Crosswalk Between the Common Core State Standards and NAEP Item Specifications
- URL: http://arxiv.org/abs/2405.17284v2
- Date: Fri, 31 May 2024 21:30:44 GMT
- Title: An NLP Crosswalk Between the Common Core State Standards and NAEP Item Specifications
- Authors: Gregory Camilli,
- Abstract summary: I describe an NLP-based procedure that can be used to support subject matter experts in establishing a crosswalk between item specifications and content standards.
The procedure is used to evaluate the match of the Common Core State Standards for mathematics at grade 4 to the corresponding item specifications for the 2026 National Assessment of Educational Progress.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Natural language processing (NLP) is rapidly developing for applications in educational assessment. In this paper, I describe an NLP-based procedure that can be used to support subject matter experts in establishing a crosswalk between item specifications and content standards. This paper extends recent work by proposing and demonstrating the use of multivariate similarity based on embedding vectors for sentences or texts. In particular, a hybrid regression procedure is demonstrated for establishing the match of each content standard to multiple item specifications. The procedure is used to evaluate the match of the Common Core State Standards (CCSS) for mathematics at grade 4 to the corresponding item specifications for the 2026 National Assessment of Educational Progress (NAEP).
Related papers
- Automated Generation of Curriculum-Aligned Multiple-Choice Questions for Malaysian Secondary Mathematics Using Generative AI [0.10995326465245928]
This paper addresses the need for scalable and high-quality educational assessment tools within the Malaysian education system.<n>It highlights the potential of Generative AI (GenAI) while acknowledging the challenges of ensuring factual accuracy and curriculum alignment.
arXiv Detail & Related papers (2025-08-06T13:30:51Z) - Foundations and Evaluations in NLP [1.0619039878979954]
This memoir explores two fundamental aspects of Natural Language Processing (NLP): the creation of linguistic resources and the evaluation of NLP system performance.
My work has focused on developing a morpheme-based annotation scheme for the Korean language that captures linguistic properties from morphology to semantics.
I have proposed a novel evaluation framework, the jp-algorithm, which introduces an alignment-based method to address challenges in preprocessing tasks.
arXiv Detail & Related papers (2025-04-02T04:14:03Z) - GiesKaNe: Bridging Past and Present in Grammatical Theory and Practical Application [0.0]
Article explores the requirements for corpus compilation within the GiesKaNe project.
As a historical corpus, GiesKaNe aims to establish connections with both historical and contemporary corpora.
The methodological complexity of such a project is managed through a complementary interplay of human expertise and machine-assisted processes.
arXiv Detail & Related papers (2025-02-07T17:35:33Z) - A Preliminary Study of Multilingual Code Language Models for Code Generation Task Using Translated Benchmarks [0.0]
We evaluate the performance of Poly-Coder, a pioneering open-source, multilingual CLM built for code generation.
Our results suggest that the outcomes observed in these translated benchmarks align well with evaluation metrics used during the training phase.
These initial insights highlight the need for more comprehensive empirical studies.
arXiv Detail & Related papers (2024-11-23T06:40:47Z) - NLP Cluster Analysis of Common Core State Standards and NAEP Item Specifications [0.0]
Camilli (2024) proposed a methodology using natural language processing (NLP) to map the relationship of a set of content standards to item specifications.
This study provided evidence that NLP can be used to improve the mapping process.
arXiv Detail & Related papers (2024-11-20T15:44:58Z) - Guidelines for Fine-grained Sentence-level Arabic Readability Annotation [9.261022921574318]
The Balanced Arabic Readability Evaluation Corpus (BAREC) project is designed to address the need for comprehensive Arabic language resources aligned with diverse readability levels.
Inspired by the Taha/Arabi21 readability reference, BAREC aims to provide a standardized reference for assessing sentence-level Arabic text readability across 19 distinct levels.
This paper focuses on our meticulous annotation guidelines, demonstrated through the analysis of 10,631 sentences/phrases (113,651 words)
arXiv Detail & Related papers (2024-10-11T09:59:46Z) - Benchmarking Large Language Models for Conversational Question Answering in Multi-instructional Documents [61.41316121093604]
We present InsCoQA, a novel benchmark for evaluating large language models (LLMs) in the context of conversational question answering (CQA)
Sourced from extensive, encyclopedia-style instructional content, InsCoQA assesses models on their ability to retrieve, interpret, and accurately summarize procedural guidance from multiple documents.
We also propose InsEval, an LLM-assisted evaluator that measures the integrity and accuracy of generated responses and procedural instructions.
arXiv Detail & Related papers (2024-10-01T09:10:00Z) - Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques [3.197435100145382]
Ordinal Classification (OC) is a widely encountered challenge in Natural Language Processing (NLP)
Previous approaches to tackle OC have primarily focused on modifying existing or creating novel loss functions that textbfexplicitly account for the ordinal nature of labels.
With the advent of Pretrained Language Models (PLMs), it became possible to tackle ordinality through the textbfimplicit semantics of the labels as well.
arXiv Detail & Related papers (2024-05-20T04:31:04Z) - An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP.
We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z) - Revise and Resubmit: An Intertextual Model of Text-based Collaboration
in Peer Review [52.359007622096684]
Peer review is a key component of the publishing process in most fields of science.
Existing NLP studies focus on the analysis of individual texts.
editorial assistance often requires modeling interactions between pairs of texts.
arXiv Detail & Related papers (2022-04-22T16:39:38Z) - Unsupervised Attention-based Sentence-Level Meta-Embeddings from
Contextualised Language Models [15.900069711477542]
We propose a sentence-level meta-embedding learning method that takes independently trained contextualised word embedding models.
Our proposed method is unsupervised and is not tied to a particular downstream task.
Experimental results show that our proposed unsupervised sentence-level meta-embedding method outperforms previously proposed sentence-level meta-embedding methods.
arXiv Detail & Related papers (2022-04-16T08:20:24Z) - Benchmarking Generalization via In-Context Instructions on 1,600+
Language Tasks [95.06087720086133]
Natural-Instructions v2 is a collection of 1,600+ diverse language tasks and their expert written instructions.
The benchmark covers 70+ distinct task types, such as tagging, in-filling, and rewriting.
This benchmark enables large-scale evaluation of cross-task generalization of the models.
arXiv Detail & Related papers (2022-04-16T03:12:30Z) - Multi-view Subword Regularization [111.04350390045705]
Multi-view Subword Regularization (MVR) is a method that enforces the consistency between predictions of using inputs tokenized by the standard and probabilistic segmentations.
Results on the XTREME multilingual benchmark show that MVR brings consistent improvements of up to 2.5 points over using standard segmentation algorithms.
arXiv Detail & Related papers (2021-03-15T16:07:42Z) - NEMO: Frequentist Inference Approach to Constrained Linguistic Typology
Feature Prediction in SIGTYP 2020 Shared Task [83.43738174234053]
We employ frequentist inference to represent correlations between typological features and use this representation to train simple multi-class estimators that predict individual features.
Our best configuration achieved the micro-averaged accuracy score of 0.66 on 149 test languages.
arXiv Detail & Related papers (2020-10-12T19:25:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.